It has been previously suggested that ensembles of central auditory neurons optimize a sustained firing criterion as part of the underlying code for representing sound. Moreover, computational studies have shown that optimizing such a criterion yields ensembles of spectro-temporal receptive fields akin to those observed in physiological studies. In this study we show that these emergent receptive fields contour the high energy modulations in speech. A simple 2D filter thus derived is shown to improve upon the performance of state-of-the-art phoneme recognition systems under both additive noise conditions and reverberation by 6.2% absolute on average.
Index Terms: robust feature extraction, bio-inspired features, sustained neural firings