ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing

Natalia Tomashenko, Yuri Khokhlov

In this paper we propose a novel speaker adaptation method for a context-dependent deep neural network HMM (CD-DNN-HMM) acoustic model. The approach is based on using GMM-derived features as the input to the DNN. The described technique of processing features for DNNs makes it possible to use GMM-HMM adaptation algorithms in the neural network framework. Adaptation to a new speaker can be simply performed by adapting an auxiliary GMM-HMM model used in calculation of GMM-derived features and can be regarded as adaptation in the feature space for a DNN system. In this work, traditional maximum a posteriori adaptation is performed for an auxiliary GMM-HMM model. Experiments show that the proposed adaptation technique can provide, on average, a 5%–36% relative word error reduction on different adaptation sets under supervised adaptation setup, compared to speaker independent (SI) CD-DNN-HMM systems. In addition, several multi-stream combination techniques are examined in order to improve the performance of the baseline SI model.


doi: 10.21437/Interspeech.2014-501

Cite as: Tomashenko, N., Khokhlov, Y. (2014) Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. Proc. Interspeech 2014, 2997-3001, doi: 10.21437/Interspeech.2014-501

@inproceedings{tomashenko14_interspeech,
  author={Natalia Tomashenko and Yuri Khokhlov},
  title={{Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2997--3001},
  doi={10.21437/Interspeech.2014-501},
  issn={2308-457X}
}