In this paper, we adress the problem of additive noise which degrades substantially the performances of speech recognition system. We propose a cepstral denoising based on the Subspace Gaussian Mixture Models paradigm (SGMM). The acoustic space is modeled by using a UBM-GMM. Each phoneme is modeled by a GMM derived from the UBM. The concatenation of the means of a given GMM leads to a very high dimention space, called the supervector space. The SGMM paradigm allows to model the additive noise as an additive component located in a subspace of low dimension (with respect to the supervector space). For each speech segment, this additive noise component is estimated in a model space. From this estimation, a specific frame transformation is obtained and applied to such a data frame. In this work, training data are assumed to be clean, so the cleaning process is applied only on test data. The proposed approach is tested on data recorded in a noisy environment and also on artificially noised data. With this approach we obtain, on data recorded in a noisy environment, a relative WER reduction of 15%.
Index Terms: Acoustic modeling, Noise compensation, Subspace Gaussian Mixture Models, robust speech recognition