ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Automatic Mean Opinion Score Estimation with Temporal Modulation Features on Gammatone Filterbank for Speech Assessment

Huy Nguyen, Kai Li, Masashi Unoki

The mean opinion score (MOS) obtained by listening tests is a key component of speech quality evaluation. However, as subjective tests are too costly to conduct on a large scale, it is necessary to estimate the MOS objectively. Thus far, the features used in existing methods for automatic MOS prediction are not based on human perception of speech. In this paper, we propose an automatic MOS estimation method using temporal modulation features on the gammatone filterbank to improve the correlation of the predicted MOS with human perception. We evaluated our method using utterance-level and system-level mean squared errors (MSEs) and Spearman rank correlation coefficients (SRCCs). Compared with the baseline method of the VoiceMOS challenge, the proposed method had a better performance in both utterance-level metrics and system-level SRCC. It also exhibited a significant improvement for utterances with low MOS values.