ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

N-MTTL SI Model: Non-Intrusive Multi-Task Transfer Learning-Based Speech Intelligibility Prediction Model with Scenery Classification

Ľuboš Marcinek, Michael Stone, Rebecca Millman, Patrick Gaydecki

The application of speech enhancement algorithms for hearing aids may not always be beneficial to increasing speech intelligibility. Therefore, a prior environment classification could be important. However, previous speech intelligibility models do not provide any additional information regarding the reason for a decrease in speech intelligibility. We propose a unique non-intrusive multi-task transfer learning-based speech intelligibility prediction model with scenery classification (N-MTTL SI model). The solution combines a Mel-spectrogram analysis of the degraded speech signal with transfer learning and multi-task learning to provide simultaneous speech intelligibility prediction (task 1) and scenery classification of ten real-world noise conditions (task 2). The model utilises a pre-trained ResNet architecture as an encoder for feature extraction. The prediction accuracy of the N-MTTL SI model for both tasks is high. Specifically, RMSE of speech intelligibility predictions for seen and unseen conditions is 3.76% and 4.06%. The classification accuracy is 98%. In addition, the proposed solution demonstrates the potential of using pre-trained deep learning models in the domain of speech intelligibility prediction.