ISCA Archive SLaTE 2023
ISCA Archive SLaTE 2023

Sensitivity to Phonemic Contrasts and Insensitivity to Non-phonemic Contrasts of Various Speech Representations Tested for L2 Speech Assessment

Haitong Sun, Yingxiang Gao, Yusuke Shozui, Tong Ma, Nobuaki Minematsu

To assess the segmental aspect of L2 speech produced by various types of learners, researchers and teachers need speech representations which satisfy two conditions of being able to capture phonemic contrasts accurately and ignore non-phonemic contrasts adequately. Acoustically, both of the contrasts can be equally characterized by spectrum envelopes. Therefore, purely acoustic representations such as MFCC cannot satisfy the two conditions. Recently, phonetic posteriorgrams, which are estimated by DNN-based acoustic models of ASR, are used for L2 assessment. More recently, various kinds of self-supervised representations are proposed such as wav2vec2 and WavLM. In this study, by setting up a simple and adequate metric to examine sensitivity to phonemic contrasts and insensitivity to non-phonemic contrasts, various pretrained models are compared. Experiments show WavLM is superior to other self-supervised representations and even better than supervised representations in some cases.