ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Non-intrusive Speech Quality Assessment with a Multi-Task Learning based Subband Adaptive Attention Temporal Convolutional Neural Network

Xiaofeng Shu, Yanjie Chen, Chuxiang Shang, Yan Zhao, Chengshuai Zhao, Yehang Zhu, Chuanzeng Huang, Yuxuan Wang

In terms of subjective evaluations, speech quality has been generally described by a mean opinion score (MOS). In recent years, non-intrusive speech quality assessment shows an active progress by leveraging deep learning techniques. In this paper, we propose a new multi-task learning based model, termed as subband adaptive attention temporal convolutional neural net- work (SAA-TCN), to perform non-intrusive speech quality assessment with the help of MOS value interval detector (VID) auxiliary task. Instead of using fullband magnitude spectrogram, the proposed model takes subband magnitude spectrogram as the input to reduce model parameters and prevent over- fitting. To effectively utilize the energy distribution information along the subband frequency dimension, subband adaptive attention (SAA) is employed to enhance the TCN model. Experimental results reveal that the proposed method achieves a superior performance on predicting the MOS values. In ConferencingSpeech 2022 Challenge, our method achieves a mean Pearson's correlation coefficient (PCC) score of 0.763 and out- performs the challenge baseline method by 0.233.