ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech

Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Schüldt, Saikat Chatterjee

We propose a deep neural network-based architecture and training design for objective non-intrusive speech quality assessment. The proposed method builds on DNSMOS, and we call the proposed model DNSMOS Pro. DNSMOS Pro has a reduced-size architecture suitable for VoIP, a relatively simple training design using only the mean opinion score (MOS) as the target label, and predicts the posterior distribution of MOS given an input speech clip. This means DNSMOS Pro can be trained when only the MOS is reported on a subjectively rated dataset. Furthermore, we implement several non-intrusive speech quality methods and compare them to DNSMOS Pro when training and testing on different subjectively rated datasets. DNSMOS Pro has significantly better performance on these benchmark datasets compared to similar DNN-based non-intrusive speech quality methods, and competitive results to methods assuming auxiliary information in the datasets.