The standard missing feature imputation approach to noise-robust automatic speech recognition requires that a single foreground/background segmentation mask is identified prior to reconstruction. This paper presents a novel imputation approach which more closely couples the identification and reconstruction of missing features by using a probabilistic framework based on the speech fragment decoding technique. Using fragment decoding, the most joint-likely state sequence and segmentation hypothesis is identified with which the missing data region is imputed. Crucially, however, imputation can exploit the speech state sequence recovered by the fragment decoding. Further, using N-best decodings allows the clean spectrogram to be estimated as a weighted combination of reconstructions which provides some allowance for uncertainty in the estimates. Experiments on the PASCAL CHiME Challenge task show that system performance is highly dependent on the complexity of the speech models used for segmentation and imputation, and by exploiting the temporal constraint of speech the system significantly outperforms those that ignore the constraint.
Index Terms: Missing feature reconstruction, noise-robust speech recognition, feature compensation, fragment decoding