ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Application of pretrained deep neural networks to large vocabulary speech recognition

Navdeep Jaitly, Patrick Nguyen, Andrew Senior, Vincent Vanhoucke

The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network - Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained contextdependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretrained ANN/HMM systems - 5870 hours of Voice Search and 1400 hours of YouTube data. On the first dataset, the pretrained ANN/HMM system outperforms the best Gaussian Mixture Model - Hidden Markov Model (GMM/HMM) baseline, built with a much larger dataset by 3.7% absolute WER, while on the second dataset, it outperforms the GMM/HMM baseline by 2.9% absolute. Maximum Mutual Information (MMI) fine tuning and model combination using Segmental Conditional Random Fields (SCARF) give additional gains of 0.1% and 0.4% on the first dataset and 0.6% and 1.1% absolute on the second dataset.

Index Terms: Deep Belief Networks, Acoustic Modeling, Artificial Neural Network, ANN/HMM


doi: 10.21437/Interspeech.2012-10

Cite as: Jaitly, N., Nguyen, P., Senior, A., Vanhoucke, V. (2012) Application of pretrained deep neural networks to large vocabulary speech recognition. Proc. Interspeech 2012, 2578-2581, doi: 10.21437/Interspeech.2012-10

@inproceedings{jaitly12_interspeech,
  author={Navdeep Jaitly and Patrick Nguyen and Andrew Senior and Vincent Vanhoucke},
  title={{Application of pretrained deep neural networks to large vocabulary speech recognition}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={2578--2581},
  doi={10.21437/Interspeech.2012-10},
  issn={2958-1796}
}