ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Joint prediction of subjective listening effort and speech intelligibility based on end-to-end learning

Dirk Eike Hoffner, Jana Roßbach, Bernd T. Meyer

Subjective listening effort and speech intelligibility are crucial aspects in human communication. Models that can predict these metrics are important tools to develop speech enhancement or compression algorithms. To make predictions in real-life situations, non-intrusive models are required which do not use a clean reference signal as additional input. This paper explores a non-intrusive model for joint prediction of listening effort and speech intelligibility, which is based on character probabilities obtained from an end-to-end automatic speech recognition system. The uncertainty of the character classification is quantified using an entropy-based measure and compared to subjective data from normal-hearing and hearing-impaired listeners. The proposed model achieves correlation values of at least 0.9 and a root-mean-square error at or below 5 percentage points for speech intelligibility, and outperforms an intrusive baseline in four out of six conditions.