ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Attention Models and Auditory Transduction Features for Noise Robustness

Cathal Ó Faoláin, Andrew Hines

Human abilities surpass current speech processing systems in complex, noisy environments. While popular inputs for Automatic Speech Recognition (ASR) systems, such as raw acoustic signals and Mel spectrograms, perform well in quiet conditions, their effectiveness declines in noise. A recently developed generative WaveNet-based model emulates human auditory transduction in real time, offering alternative input features through its “IHCogram” outputs. We investigate these IHCograms across various Signal-to-Noise ratios (SNRs) using state-of-the-art feature encoders. Our findings show that IHCograms significantly enhance phoneme recognition in noisy conditions with minimal computational overhead, regardless of the model encoder used. Additionally, we introduce our Attention Feature Encoder (AFE) models, which leverage the channel structure of IHCograms and demonstrate superior size and performance compared to existing feature encoders.