Acoustic cues to the distinction between sibilant fricatives are claimed to be invariant across languages. In [1], Evers et al. present a method for distinguishing automatically between [s] and [ʃ], using the slope of regression lines over separate frequency ranges within a DFT spectrum. They report accuracy rates in excess of 90% for fricatives extracted from recordings of minimal pairs in English, Dutch and Bengali. These findings are broadly replicated by [2], using VCV tokens recorded in the lab. We tested the algorithm from [1] against tokens of fricatives extracted from the TIMIT corpus of American English read speech, and the Kiel corpora of German. We were able to achieve similar accuracy rates to those reported in [1] and [2], with the following caveats: (1) the measure relies on being able to perform a DFT for frequencies from 0 to 8 kHz, so that a minimum sampling rate of 16 kHz is necessary for it to be effective, and (2) although the measure draws a similarly clear distinction between [s] and [ʃ] to those found in previous studies, the absolute value of the threshold between the two sounds is sensitive to the dynamic range of the input signal.Acoustic cues to the distinction between sibilant fricatives are claimed to be invariant across languages. In [1], Evers et al. present a method for distinguishing automatically between [s] and [ʃ], using the slope of regression lines over separate frequency ranges within a DFT spectrum. They report accuracy rates in excess of 90% for fricatives extracted from recordings of minimal pairs in English, Dutch and Bengali. These findings are broadly replicated by [2], using VCV tokens recorded in the lab. We tested the algorithm from [1] against tokens of fricatives extracted from the TIMIT corpus of American English read speech, and the Kiel corpora of German. We were able to achieve similar accuracy rates to those reported in [1] and [2], with the following caveats: (1) the measure relies on being able to perform a DFT for frequencies from 0 to 8 kHz, so that a minimum sampling rate of 16 kHz is necessary for it to be effective, and (2) although the measure draws a similarly clear distinction between [s] and [ʃ] to those found in previous studies, the absolute value of the threshold between the two sounds is sensitive to the dynamic range of the input signal.
s V. Evers, H. Reetz, and A. Lahiri, “Crosslinguistic acoustic categorization of sibilants independent of phonological status,” Journal of Phonetics, vol. 26, pp. 345—370, 1998. K. Maniwa, A. Jongman, and T. Wade, “Acoustic characteristics of clearly spoken English fricatives,” Journal of the Acoustical Society of America, vol. 125, pp. 3962–3973, 2009.