ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Integrating multiple pronunciations during MCE-based acoustic model training for large vocabulary speech recognition

Rathi Chengalvarayan

In this paper, we report on the implementation of an automatic method for discovering an appropriate pronunciation for each speech utterance of every speaker and integrating this new information into minimum classification error (MCE) based training algorithm. The proposed method allows a lot more flexibility in adapting multiple pronunciations during the existing supervised acoustic model training where the phoneme sequence of a particular word is always fixed irrespective of speaker accents and pronunciation variations. Several large vocabulary recognition results on French SpeechDat-II speech corpus show a consistent string error rate reduction of about 48% and 13% obtained by the proposed integrated method when compared to the MLE-trained and MCE-trained baseline systems.