ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

The effect of pitch and lexical tone on different Mandarin speech recognition tasks

Yiu Wing Wong, Eric Chang

Tone is an important component in Mandarin speech recognition. It is necessary to recognize the five lexical tones to disambiguate between confusing words. Tone is acoustically characterized by the pitch contour. The use of pitch has been shown to be helpful in Mandarin syllable recognition. In this paper, a comprehensive set of investigations on the effect of pitch on diverse Mandarin speech recognition tasks, namely large vocabulary continuous speech recognition (LVCSR) and isolated word recognition, is reported. In this paper, various techniques to utilize pitch in acoustic modeling are examined. In particular, modeling of tone context dependence and normalization of pitch value are investigated. The experimental result shows that with the incorporation of pitch, an error reduction of 26% can be achieved in tonal syllable recognition. The same level of error reduction is attained in isolated word recognition. On the other hand, the gain from using pitch in an LVCSR task is less. The result suggests that without a language model, the use of pitch is more beneficial in Mandarin speech recognition, thus speech recognizers may be designed to dynamically make use of the pitch feature to obtain the best tradeoff between accuracy and computation.