This paper investigates a factor analysis scheme in the joint channel space of stereo-based stochastic mapping (SSM) for noise robust automatic speech recognition. A mixture of Bayesian factor analyzers is used to describe the generative factors in the multi-conditional training scenario in terms of noise type and signal-to-noise ratio. Sparsity-promoting prior is applied on the matrix of factor loadings to automatically learn the effective factors from a redundant dictionary in a particular soft cluster. Experiments carried out on large vocabulary continuous speech recognition tasks show that this sparse Bayesian factor analysis scheme leads to superior SSM performance for noise robustness.
Index Terms: Bayesian factor analysis, sparsity learning, stereo-based stochastic mapping, noise robust automatic speech recognition