With the increasing use of technology in classrooms, computer assisted pronunciation training (CAPT) is becoming a vital tool in language learning. In this paper, we present a system that takes advantage of data from learners of a specific L1 to better model phonological errors at various levels in the system. At the lexical level, a statistical machine translation approach is used to model common phonological errors produced by a specific L1 population. At the acoustic level, L1-dependent maximum likelihood (ML) nonnative models and discriminative training are explored. In our experiments, use of a Korean language dependent nonnative lexicon gives us diagnostic abilities that did not exist in our baseline configuration. Replacing the native ML acoustic model with the L1-dependent nonnative model produces relative improvements of 27.37% in precision for phone detection/identification tasks. We also propose a constrained variant of minimum phone error (MPE) training which is better adapted to phone detection/diagnosis. This technique produces 5.6% relative improvement in precision in comparison to ML nonnative acoustic models.
Index Terms: language learning, phonological error modeling, machine translation, minimum phone error training