Discriminative language modeling provides a mechanism for differentiating between competing word hypotheses, which are usually ignored in traditional maximum likelihood estimation of N-gram language models. Discriminative language modeling usually requires manual transcription which can be costly and slow to obtain. On the other hand, there are vast amount of untranscribed speech data on which offline adaptation technique can be applied to generate pseudo-truth transcription as an approximation to manual transcription. Viewing manual and pseudo-truth transcriptions as two domains, we perform hierarchical Bayesian domain adaptation on discriminative language models sharing a common prior model. Domain-specific and prior models are estimated jointly using training data. In the N-best list rescoring experiment, hierarchical Bayesian domain adaptation has yielded better recognition performance than the model trained only on manual transcription, and seems robust against inferior prior.
Index Terms: Hierarchical Bayesian domain adaptation, Discriminative language modeling, semi-supervised learning