ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

When The MOS Predictor Asks For Training Annotation In Cross Lingual/Domain Adaptation

Natacha Miniconi, Meysam Shamsi, Anthony Larcher

The Mean Opinion Score (MOS) is widely used to assess speech synthesis quality, but requires costly human evaluation. Automatic MOS predictors have been developed to estimate MOS. Training and generalizing these predictors across different languages and domains remain difficult due to the high cost of labeled data. To optimize MOS prediction while minimizing human annotation efforts, we explore active learning. As far as we know, this is the first study to investigate the use of active learning as training sample selection strategies for enhancing MOS prediction. We investigate its effectiveness on two tasks: cross-domain and cross-lingual adaptation, comparing multiple selection strategies that rely on uncertainty or diversity measures. Our sample selection strategies have been compared with random selection. Among these strategies, Monte Carlo (MC) Dropout proved effective for cross-lingual adaptation, while perturbation noise performed well for cross-domain adaptation.