ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Extended High-frequency Cues to Phoneme Recognition: Insights from ASR

Zhe-chen Guo, Bharath Chandrasekaran

There is emerging evidence that extended high frequencies (EHFs; >8 kHz) improve speech perception in noise. Yet, the mechanisms underlying this benefit remain unclear. We investigated whether EHFs contribute to phoneme recognition using an automatic speech recognition (ASR) model. A neural network model was trained to decode phonemes from cochleagrams of broadband speech and speech low-pass filtered at 8 or 6 kHz in quiet and masked conditions with varying target-to-masker ratios (TMRs) and target-masker spatial separations. Compared with filtered speech, broadband speech improved phoneme recognition accuracy in masked conditions, particularly at lower TMRs, but showed no benefit in quiet. Removing EHFs increased the probability of the model omitting a phoneme more for consonants than vowels. The findings suggest that the EHF benefit in adverse conditions may partly arise from enhanced phoneme processing, highlighting the potential of improving audiometry and ASR by including EHFs.