ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Soft-label Learn for No-Intrusive Speech Quality Assessment

Junyong Hao, Shunzhou Ye, Cheng Lu, Fei Dong, Jingang Liu, Dong Pi

Mean opinion score (MOS) is a widely used subjective metric to assess the quality of speech, and usually involves multiple human to judge each speech file. To reduce the labor cost of MOS, no-intrusive speech quality assessment methods have been extensively studied. However, due to the highly subjective bias of speech quality label, the performance of models to accurately represent speech quality scores is difficult to be trained. In this paper, we propose a convolutional self-attention neural network (Conformer) for MOS score prediction of conference speech to effectively alleviate the disadvantage of subjective bias on model training. In addition to this novel architecture, we further improve the generalization and accuracy of the predictor by utilizing attention label pooling and soft-label learning. We demonstrate that our proposed method achieves RMSE cost of 0.458 and PLCC score of 0.792 on evaluation test datasets of Conferencing Speech 2022 Challenge