ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

On the interplay between auditory-based features and locally recurrent neural networks for robust speech recognition in noise

Jurgen Tchorz, Klaus Kasper, Herbert Reininger, Bilger Kollmeier

The combination of a model of auditory perception (PEMO) as feature extractor and of a Locally Recurrent Neural Network (LRNN) as classifier yields promising ASR results in noise. Our study focuses on the interplay between both techniques and their ability to complement each other in the task of robust speech recognition. We performed recognition experiments with modifications of PEMO processing concerning amplitude compression and envelope modulation filtering. The results show that the distinct and sparse peaks of PEMO speech representation which are well maintained in noise are sufficient cues for LRNN-based recognition due to LRNN's ability to exploit information which is distributed over time. Enhanced envelope modulation bandpass filtering of PEMO feature vectors better reflects the average modulation spectrum of speech and further decreases the influence of noise.


doi: 10.21437/Eurospeech.1997-549

Cite as: Tchorz, J., Kasper, K., Reininger, H., Kollmeier, B. (1997) On the interplay between auditory-based features and locally recurrent neural networks for robust speech recognition in noise. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 2075-2078, doi: 10.21437/Eurospeech.1997-549

@inproceedings{tchorz97_eurospeech,
  author={Jurgen Tchorz and Klaus Kasper and Herbert Reininger and Bilger Kollmeier},
  title={{On the interplay between auditory-based features and locally recurrent neural networks for robust speech recognition in noise}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={2075--2078},
  doi={10.21437/Eurospeech.1997-549},
  issn={1018-4074}
}