ISCA Archive ICSLP 1996
ISCA Archive ICSLP 1996

Analysis of speech segments using variable spectral/temporal resolution

Xihong Wang, Stephen A. Zahorian, Stefan Auberg

In this paper we present an approach for efficiently computing a compact temporal/spectral feature set for representing a segment of speech, with effective resolution depending on both frequency and time position within the segment. The goal is to mimic the resolution properties of the human auditory system, but using a computationally efficient FFT-based front end rather than a more complex auditory model. In particular we apply both frequency and time "warping" to FFT spectra to obtain good frequency resolution at low frequencies and good time resolution at high frequencies. Time resolution is also varied so that the center of the segment is better represented than the endpoints. The resolution can be varied by the selection of "warping" functions controlled using a small number of parameters. The method was experimentally verified for the classification of six stops extracted from the TIMIT continuous speech data base. The best classification rate obtained was 81.2% for test data using 50 features computed with the method presented.