ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

ELO-SPHERES intelligibility prediction model for the Clarity Prediction Challenge 2022

Mark Huckvale, Gaston Hilkhuysen

This paper describes and evaluates the ELO-SPHERES project sentence intelligibility model for the Clarity Prediction Challenge 2022. The aim of the model is to make predictions of the intelligibility of enhanced speech to hearing impaired listeners. Input to the model are binaural processed audio of short sentences generated in a simulated noisy and reverberant environment together with the original source audio. Output of the model is a prediction of the intelligibility of each sentence in terms of percentage words correct for a known hearing-impaired listener characterized by a pure-tone audiogram. Models are evaluated in terms of the root mean squared error of prediction. We approached this problem in three stages: (i) evaluation of the influences of the scene metadata on scores, (ii) construction of classifiers for estimation of scene metadata from audio, and (iii) training a non-linear regression model on the challenge data and evaluation using 5-fold cross validation. On the test data, a baseline system using only the standard short-time objective intelligibility metric on the better ear achieved a RMS prediction error of 27%, while our model that also took into account given and estimated scene data achieved an RMS error of 22%.