ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Improved hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge

Preethi Jyothi, Mark Hasegawa-Johnson

In this work, we present a new large-vocabulary, broadcast news ASR system for Hindi. Since Hindi has a largely phonemic orthography, the pronunciation model was automatically generated from text. We experiment with several variants of this model and study the effect of incorporating word boundary information with these models. We also experiment with knowledge-based adaptations to the language model in Hindi, derived in an unsupervised manner, that lead to small improvements in word error rate (WER). Our experiments were conducted on a new corpus assembled from publicly-available Hindi news broadcasts. We evaluate our techniques on an open-vocabulary task and obtain competitive WERs on an unseen test set.