In this paper, we discuss the methods used to assess non-native Mandarin speakers' tone automatically. A context-dependent syllable-level tone modeling method for tone assessment is proposed. A direct comparison between a speaker's contours and ideal contours in energy and pitch using a syllable-level normalization technique provides a strong prediction of the speaker's tone as rated by humans. By combining features from energy and pitch with other features such as duration and spectral likelihoods at the phoneme level, we achieved a humanmachine correlation coefficient of 0.77 at the response level and 0.85 at the participant level. As a comparison, the correlation coefficient between human raters was 0.66 at the response level. The results support both the new proposed method and also the use of Read Aloud as a task to assess non-native Mandarin speaker's tone automatically.
Index Terms: Chinese, Mandarin, tone, prosody, assessment, proficiency test