ISCA Archive Clarity 2025
ISCA Archive Clarity 2025

Machine learning for computational audiology: Prediction of auditory perception and improvement of speech signals based on deep learning

Bernd T. Meyer, Jana Roßbach, Dirk E. Hoffner, Nils L. Westhausen, Hartmut Schoon, Simon Weihe, Hendrik Kayser, Swati Vivekananthan, Kirsten Wagener, Thomas Brand, Jan Rennies-Hochmuth, Rainer Huber
Automatic speech recognition (ASR) and speech technology have fundamentally improved over the last decades. While Lippmann (1997) reported an order-of-magnitude difference between the recognition rates of humans and ASR, the human–machine gap has since been closed for specific tasks. These advances - together with deep-learning approaches in speech enhancement - have motivated the use of speech technology in the context of hearing aids. Blind models to predict speech intelligibility based on speech technology could serve as model-in-the-loop in future hearing aids. In our previous work, we analyzed phone posteriors obtained from a deep neural network, which are degraded in the presence of noise and reverberation. We quantified this degradation for phone posteriors with an additional binaural integration stage, which enables accurate predictions of speech intelligibility in spatial scenes. The model can also be used to select the hearing-aid algorithm that maximizes speech intelligibility in hearing-impaired listeners. In a second line of research, we applied deep learning to estimate the filter coefficients of a beamformer in hearing aids. This system has an algorithmic delay of 5.4 ms, a relatively small footprint (below 200k parameters), and improves speech intelligibility for hearing-impaired listeners, as shown in a subjective study.