Posterior probability calculated within a standard pronunciation space (SPS) is a common method in automatic pronunciation error detection (APED). However, if pronunciation errors are not within the SPS, the method is only able to find an approximate solution, that may be not right in many cases. This paper expands the SPS to include more pronunciation errors, proposes an unsupervised clustering of pronunciation errors based on Bhattacharyya distance, and then refines more detailed acoustic models for APED within the extended pronunciation space (EPS). The relationship between the performance of APED system and the number of cluster or the size of the EPS is also discussed. The experimental results show that, compared with the APED based on the SPS, the one based on the EPS using adaptive unsupervised clustering of pronunciation errors has better performance and the average scoring error rate (ASER) decreases from 0.412 to 0.301, relatively 26.94%.
Index Terms: automatic pronunciation error detection, pronunciation space, unsupervised clustering of pronunciation errors, Bhattacharyya distance