Expressing noisy speech spectra as a linear combination of speech and noise exemplars has been shown to be a powerful tool to achieve noise robust ASR. Such a model has been used both to do feature enhancement (FE) and to directly provide noise robust speech state probabilities using a method called sparse classification (SC). The goal of this work is threefold: First, we integrate various SC advances recently proposed in literature, second, we improve upon the results obtained with FE through retraining and multi-condition training of the acoustic models used in the recognizer and finally, we propose the use of a single hybrid SC-FE system. In our experiments on AURORA-2 we obtain an impressive 3% and 5% average WER on matched and on mismatched noise types, respectively.
Index Terms: noise robustness, exemplar-based speech recognition, multi-stream decoding