ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring

Kaiqi Fu, Shaojun Gao, Xiaohai Tian, Wei Li, MA Zejun

Automatic non-native fluency scoring is a challenging task which relies heavily on the effectiveness of the handcrafted fluency features used for predicting fluency scores. In this paper, we investigate the use of sequence model to automatically learn utterance-level fluency representation from phone-level raw sequential features. Specifically, the raw counterpart of traditional handcrafted features (e.g., GOP, speech rate, and speech break) are first collected at the phone-level, pre-processing net, BLSTM, and average pooling are then applied to a sequence of those features to get utterance-level fluency representation for final fluency scoring. Experimental results conducted on the non-native database suggest that the proposed framework outperforms the handcrafted feature based systems in terms of Pearson correlation coefficients (PCC). In addition, an ablation study is performed to better understand the improvements brought by different raw features and representation strategies used in our proposed fluency scorer.