ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Modeling frequent allophones in Japanese speech recognition

Long Nguyen, Xuefeng Guo, John Makhoul

In this paper, we describe a technique to model frequent allophones in Japanese speech recognition. The Consonant-Vowel syllabic structure (CV) is dominant in Japanese. Based on frequency, the distribution of CV pairs is rather skewed. Isolating out the most frequent allophones through the use of additional phonemes in acoustic modeling can achieve better recognition accuracy. By introducing ten new phonemes for the five most common CV pairs, we achieved a 30% relative reduction in word error rate for spontaneous speech and 6% relative reduction overall for all speech categories in a Japanese broadcast news transcription task.