ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Embedding Learning for Preference-based Speech Quality Assessment

ChengHung Hu, Yusuke Yasuda, Tomoki Toda

One goal of Speech Quality Assessment is to compare the quality of different utterances. Recently, several models based on preferences have been developed. These models typically use comparisons of MOS as preference scores during training. However, they often treat pairs of utterances with large differences in MOS and those with similar MOS equally, which increase the cost of accurate MOS prediction. To tackle this issue, this study suggests using embedding loss to bring pairs of utterance embeddings with similar MOS closer while separating those with dissimilar MOS. Our experiments showed that models trained with embedding loss perform better in both in-domain and out-domain scenarios. Furthermore, we use t-SNE visualization to analyze the distribution of embeddings extracted by models trained with and without embedding loss. Results indicate that embeddings of utterances with similar MOS scores are brought closer, whereas those with differing MOS scores are effectively separated.