ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Modeling pronunciation variation using context-dependent weighting and b/s refined acoustic modeling

Fang Zheng, Zhanjiang Song, Pascale Fung, William Byrne

The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems. By studying the initial/final (IF) characteristics of Chinese language and developing the Bayesian equation, we propose the concepts of generalized initial/final (GIF) and generalized syllable (GS), the GIF modeling method and the IF-GIF modeling method, as well as the context-dependent pronunciation weighting method. By using these approaches, the IF-GIF modeling reduces the Chinese syllable error rate (SER) by 6.3% and 4.2% compared with the GIF modeling and IF modeling respectively when the language modeling, such as syllable or word N-gram, is not used.