ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Variable-component deep neural network for robust speech recognition

Rui Zhao, Jinyu Li, Yifan Gong

In this paper, we propose variable-component DNN (VCDNN) to improve the robustness of context-dependent deep neural network hidden Markov model (CD-DNN-HMM). This method is inspired by the idea from variable-parameter HMM (VPHMM) in which the variation of model parameters are modeled as a set of polynomial functions of environmental signal-to-noise ratio (SNR), and during the testing, the model parameters are recomputed according to the estimated testing SNR. In VCDNN, we refine two types of DNN components: (1) weighting matrix and bias (2) the output of each layer. Experimental results on Aurora4 task show VCDNN achieved 6.53% and 5.92% relative word error rate reduction (WERR) over the standard DNN for the two methods, respectively. Under unseen SNR conditions, VCDNN gave even better result (8.46% relative WERR for the DNN varying matrix and bias, 7.08% relative WERR for the DNN varying layer output). Moreover, VCDNN with 1024 units per hidden layer beats the standard DNN with 2048 units per hidden layer with 3.22% WERR and a half computational/memory cost reduction, showing superior ability to produce sharper and more compact models.


doi: 10.21437/Interspeech.2014-154

Cite as: Zhao, R., Li, J., Gong, Y. (2014) Variable-component deep neural network for robust speech recognition. Proc. Interspeech 2014, 2719-2723, doi: 10.21437/Interspeech.2014-154

@inproceedings{zhao14_interspeech,
  author={Rui Zhao and Jinyu Li and Yifan Gong},
  title={{Variable-component deep neural network for robust speech recognition}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2719--2723},
  doi={10.21437/Interspeech.2014-154},
  issn={2308-457X}
}