ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

DeePMOS: Deep Posterior Mean-Opinion-Score of Speech

Xinyu Liang, Fredrik Cumlin, Christian Schüldt, Saikat Chatterjee

We propose a deep neural network (DNN) based method that provides a posterior distribution of mean-opinion-score (MOS) for an input speech signal. The DNN outputs parameters of the posterior, mainly the posterior's mean and variance. The proposed method is referred to as deep posterior MOS (DeePMOS). The relevant training data is inherently limited in size (limited number of labeled samples) and noisy due to the subjective nature of human listeners. For robust training of DeePMOS, we use a combination of maximum-likelihood learning, stochastic gradient noise, and a student-teacher learning setup. Using the mean of the posterior as a point estimate, we evaluate standard performance measures of the proposed DeePMOS. The results show comparable performance with existing DNN-based methods that only provide point estimates of the MOS. Then we provide an ablation study showing the importance of various components in DeePMOS.