ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Context-dependent MLPs for LVCSR: TANDEM, hybrid or both?

Zoltán Tüske, Martin Sundermeyer, Ralf Schlüter, Hermann Ney

Gaussian Mixture Model (GMM) and Multi Layer Perceptron (MLP) based acoustic models are compared on a French large vocabulary continuous speech recognition (LVCSR) task. In addition to optimizing the output layer size of the MLP, the ef- fect of the deep neural network structure is also investigated. Moreover, using different linear transformations (time deriva- tives, LDA, CMLLR) on conventional MFCC, the study is also extended to MLP based probabilistic and bottle-neck TANDEM features. Results show that using either the hybrid or bottle- neck TANDEM approach leads to similar recognition perfor- mance. However, the best performance is achieved when deep MLP acoustic models are trained on concatenated cepstral and context-dependent bottle-neck features. Further experiments re- veal the importance of the neighbouring frames in case of MLP based modeling, and that its gain over GMM acoustic models is strongly reduced by more complex features.

Index Terms: HMM, GMM, MLP, bottle-neck, hybrid, ASR, TANDEM

doi: 10.21437/Interspeech.2012-5

Cite as: Tüske, Z., Sundermeyer, M., Schlüter, R., Ney, H. (2012) Context-dependent MLPs for LVCSR: TANDEM, hybrid or both? Proc. Interspeech 2012, 18-21, doi: 10.21437/Interspeech.2012-5

  author={Zoltán Tüske and Martin Sundermeyer and Ralf Schlüter and Hermann Ney},
  title={{Context-dependent MLPs for LVCSR: TANDEM, hybrid or both?}},
  booktitle={Proc. Interspeech 2012},