Sparse Classification (SC) is an exemplar-based approach to Automatic Speech Recognition. By representing noisy speech as a sparse linear combination of speech and noise exemplars, SC allows separating speech from noise. The approach has shown its robustness in noisy conditions, but at the cost of degradation in clean conditions. In this work, rather than using the state probability estimates obtained with SC directly in a Viterbi decoding, the probability distributions of SC are modeled by Gaussian Mixture Models (GMMs), for which purpose we introduce a novel whitening transformation. Results on the AURORA-2 task show that our proposed approach is especially effective in clean speech and in the matched noise conditions in test set A. Except in the -5 dB SNR condition we also find substantial improvements in the non-matched noise conditions in test set B.
Index Terms: template-based ASR, noise robustness, speech modeling