ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers?

Mickael Rouvier, Benoit Favre

Deep neural networks (DNN) are currently very successful for acoustic modeling in ASR systems. One of the main challenges with DNNs is unsupervised speaker adaptation from an initial speaker clustering, because DNNs have a very large number of parameters. Recently, a method has been proposed to adapt DNNs to speakers by combining speaker-specific information (in the form of i-vectors computed at the speaker-cluster level) with fMLLR-transformed acoustic features. In this paper we try to gain insight on what kind of adaptation is performed on DNNs when stacking i-vectors with acoustic features and what information exactly is carried by i-vectors. We observe on REPERE corpus that DNNs trained on i-vector features concatenated with fMLLR-transformed acoustic features lead to a gain of 0.7 points. The experiments shows that using i-vector stacking in DNN acoustic models is not only performing speaker adaptation, but also adaptation to acoustic conditions.


doi: 10.21437/Interspeech.2014-503

Cite as: Rouvier, M., Favre, B. (2014) Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers? Proc. Interspeech 2014, 3007-3011, doi: 10.21437/Interspeech.2014-503

@inproceedings{rouvier14_interspeech,
  author={Mickael Rouvier and Benoit Favre},
  title={{Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers?}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={3007--3011},
  doi={10.21437/Interspeech.2014-503},
  issn={2308-457X}
}