ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition

Bernd T. Meyer, Constantin Spille, Birger Kollmeier, Nelson Morgan

Spectro-temporal filtering has been shown to result in features that can help to increase the robustness of automatic speech recognition (ASR) in the past. We replace the spectro-temporal representation used in previous work with spectrograms that incorporate knowledge about the signal processing of the human auditory system and which are derived from Power-Normalized Cepstral Coefficients (PNCCs). 2D-Gabor filters are applied to these spectrograms to extract features evaluated on a noisy digit recognition task. The filter bank is adapted to the new representation by optimizing the spectral modulation frequencies associated with each Gabor function. A comparison of optimized parameters and the spectral modulation of vowels shows a good match between optimized and expected range of frequencies. When processed with a non-linear neural net and combined with PNCCs, Gabor features decrease the error rate compared to the baseline and PNCCs by at least 19%.

Index Terms: automatic speech recognition, spectrotemporal features, power-normalized features