ISCA Archive SLaTE 2007
ISCA Archive SLaTE 2007

Structural representation of pronunciation and its application for classifying Japanese learners of English

Nobuaki Minematsu, K. Kamata, S. Asakawa, T. Makino, Keikichi Hirose

One of the most fundamental and unsolved problems in speech recognition is the mismatch problem. Speech systems trained by a speci.c group of speakers, e.g. adults, do not work well with another group, e.g. children. In the case of CALL, when a student receives a bad score from a system, it may be just because he is an outlier to the system. The problem is that he cannot know whether he is an outlier or not. Recently, a speaker-invariant structural and holistic representation of speech was proposed, where only the interrelations among speech sounds were extracted to form their external structure. Speech variation caused by speaker individuality was modeled mathematically and, based on the model, the speaker-invariance was guaranteed. This structural representation was already applied to describe the pronunciations of language learners. Since the non-linguistic factors were well removed, the representation purely showed non-nativeness in the individual pronunciations. In this paper, using the new representation, language learners are automatically classi.ed irrespective of speaker individuality. The classi.cation is also done by an expert phonetician. High correlation is found between the two classifications.