ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

No-Reference Speech Intelligibility Prediction Leveraging a Noisy-Speech ASR Pre-Trained Model

Haolan Wang, Amin Edraki, Wai-Yip Chan, Iván López-Espejo, Jesper Jensen

Recent advances in deep learning have improved the capabilities of data-driven speech intelligibility prediction (SIP) algorithms. Nevertheless, the scarcity of speech intelligibility datasets limits the development of data-driven algorithms. This study introduces a set of no-reference SIP algorithms leveraging a pre-trained wav2vec 2.0 backbone. We adapt wav2vec 2.0 for automatic speech recognition under additive noise conditions with a parameter-efficient methodology, low-rank adaptation. We demonstrate no-reference SIP algorithms designed with this approach using a moderate amount of training data. The best designs perform on par or even better than a state-of-the-art reference-based SIP algorithm across a variety of datasets comprising different degradation types.