ISCA Archive Eurospeech 2003
ISCA Archive Eurospeech 2003

Improvement of non-native speech recognition by effectively modeling frequently observed pronunciation habits

Nobuaki Minematsu, Koichi Osaki, Keikichi Hirose

In this paper, two techniques are proposed to enhance the non-native (Japanese English) speech recognition performance. The first technique effectively integrates orthographic representation of a phoneme as an additional context in state clustering in training tied-state triphones. Non-native speakers often learned the target language not through their ears but through their eyes and it is easily assumed that their pronunciation of a phoneme may depend upon its grapheme. Here, correspondence between a vowel and its grapheme is automatically extracted and used as an additional context in the state clustering. The second technique elaborately couples a Japanese English acoustic model and a Japanese Japanese model to make a parallel model. When using triphones, mapping between the two models should be carefully trained because phoneme sets of both the models are different. Here, several phoneme recognition experiments are done to induce the mapping, and based upon the mapping, a tentative method of the coupling is examined. Results of LVCSR experiments show high validity of both the proposed methods.