ISCA Archive ICSLP 1996
ISCA Archive ICSLP 1996

Spectral estimation and normalisation for robust speech recognition

Tom Claes, Fei Xie, Dirk van Compernolle

Speech recognition in adverse conditions remains a difficult but challenging problem. It is already shown [1] that normalisation of the dynamic range (SNR1) of the frequency channels in a mel scale triangular filterbank (MFCC) [2], improves the robustness against both additive and convolutional noise. Nevertheless, because the method is based on a masking-technique, the improvement is small in the case of SNR values that are smaller than the target (normalised) SNR. A solution for this problem can be found in first enhancing the filterbank energies before the masking-technique is applied. For this purpose we developed a Non-linear Spectral Estimator (NSE) for speech recognition that operates on the log filterbank energies. NSE enhances these filterbank energies and makes use of SNR-normalisation also effective at very low SNRs. Experimental results are given on the NOISEX-92 [3] database. Better recognition performance is seen even at 0dB SNR.