ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Investigation of deep neural networks for robust recognition of nonlinearly distorted speech

Ladislav Seps, Jiri Malek, Petr Cerva, Jan Nouza

This paper studies the use of hybrid context-dependent Deep Neural Network Hidden Markov Model (DNN-HMM) architecture for robust recognition of speech affected by real-world nonlinear distortions. We consider two types of distortions; a) signals distorted through overgained microphone preamplifier in the analog domain and b) recordings exhibiting unnatural spectral sparseness, caused by excessive denoising or low-bit-rate compression. We compare the performance of DNN-HMM architecture with that of the conventional system, based on context-dependent Gaussian Mixture Model (GMM)-HMMs, which applies channel/speaker adaptation and/or feature compensation in the front-end via Histogram Equalization (HEQ). We show that DNN-HMM architecture achieves a significantly lower Word Error Rate (WER) on the considered distorted datasets and that the obtained relative WER reduction is higher than 60%. We also investigate the usefulness of the feature compensation via HEQ for a DNN-HMM system and show that it can be helpful in the case of shallower networks.