In earlier studies, we assessed the degree of non-nativeness employing prosodic information. In this paper, we combine prosodic information with (1) features derived from a Gaussian Mixture Model used as Universal Background Model (GMM-UBM), a powerful approach used in speaker identification, and (2) openSMILE, a standard open-source toolkit for extracting acoustic features. We evaluate our approach with English speech from 94 non-native speakers. GMM-UBM or openSMILE modelling alone yields lower performance than our prosodic feature vector; however, adding information from the GMM-UBM modelling or openSMILE by late fusion improves results.
Index Terms: computer-assisted language learning, non-native prosody, rhythm, automatic assessment