The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. In this paper we present a compressive gammachirp filter-bank-based feature extractor that incorporates a method for the enhancement of auditory spectrum and a short-time feature normalization technique, which, by adjusting the scale and mean of cepstral features, reduces the difference of cepstra between the training and test environments. For performance evaluation, in the context of speech recognition, of the proposed feature extractor we use the standard noisy AURORA-2 corpus and the meeting recorder digits (MRDs) subset of the AURORA-5 corpus, which represent additive noise and reverberant acoustic conditions, respectively. The ETSI advanced front-end (ETSI-AFE), the recently proposed power normalized cepstral coefficients (PNCC) and conventional MFCC features are used for comparison purposes. Experimental speech recognition results depict that the proposed method is robust against both additive and reverberant environments. The proposed method provides comparable results to that of ESTI-AFE and PNCC on the AURORA-2 corpus and provides considerable improvements with respect to the other feature extractors on the AURORA-5 corpus.
Index Terms: speech recognition, compressive gammchirp, auditory spectrum enhancement, feature normalization