ISCA Archive Interspeech 2010
ISCA Archive Interspeech 2010

Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation

Md. Akmal Haidar, Douglas O'Shaughnessy

A new approach for computing weights of topic models in language model (LM) adaptation is introduced. We formed topic clusters by a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The new weighting idea is that the unigram count of the topic generated by hard-clustering is used to compute the mixture weights instead of using an LDA latent topic word count used in the literature. Our approach shows significant perplexity and word error rate (WER) reduction against the existing approach.