Adaptive training is a powerful approach for building speech recognition systems using non-homogeneous data. This work presents an extension of model-based adaptive training to handle reverberant environments. The recently proposed Reverberant VTS-Joint (RVTSJ) adaptation is used to factor out unwanted additive and reverberant noise variations in multiconditional training data, yielding a canonical model neutral to noise conditions. An maximum likelihood estimation of the canonical model parameters is described. An initialisation scheme that uses the VTS-based adaptive training to initialise the model parameters is also presented. Experiments are conducted on a reverberant simulated AURORA4 task.
Index Terms: reverberant noise robustness, vector Taylor series, adaptive training