Noise reduction frontends have been developed independently for speech communication and speech recognition purposes with the result that one and the same algorithm does not perform well in both application domains. In this paper we show that noise reduction filters based on the discrete Fourier transform (DFT) which are used in speech communication can also perform well in robust automatic speech recognition (ASR) experiments if some form of feature smoothing is applied.
We analyse the statistics of the Mel frequency cepstral coefficients (MFCCs) that are used as speech features and describe the effects on recognition results if the mean and variance of these features change. It is shown that recognizers are more sensitive to an increase in variance of enhanced features than to errors in their mean values. We present a method that compensates for the increased variance of DFT-based noise reduction frontends by means of using prior knowledge and smoothing. We achieve high segmental SNR improvements as well as recognition results close to those of the Advanced Frontend (AFE) of the European Telecommunications Standards Institute (ETSI) for all noise types.