ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Semi-supervised Acoustic and Language Modeling for Hindi ASR

Tarun Sai Bandarupalli, Shakti Rath, Nirmesh Shah, Onoe Naoyuki, Sriram Ganapathy

This paper describes the submission made by our team to the Hindi Gram Vaani ASR challenge. This challenge involves building an ASR system for spontaneous telephonic recordings. The challenge is unique because of the small amount of labelled data available for model development. On top of that, the acoustic variabilities such as spontaneity of natural conversations, rich diversity of Hindi across India and varied backgrounds present in the corpus make it much more challenging. We participated in two of the three tracks where the first track involves 100 hours of labelled speech only and the second track involves 1000 hours of additional unlabelled corpus along with 100 hours of labelled speech. A Kaldi based hybrid model has been developed for the first and second track involving TDNN-F character based acoustic model, N-gram first pass decoding, RNN-LM re-scoring and system combinations. On the other hand, for the second track, an E2E conformer based system has been trained on representations obtained from a contrastive predictive coding (CPC) model. The results obtained for both the tracks are significantly better than the baseline results published by the challenge organizers on the development set consisting of 5 hours of audio.