ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Task-aware deep bottleneck features for spoken language identification

Bing Jiang, Yan Song, Si Wei, Ian Vince McLoughlin, Li-Rong Dai

Recently, deep bottleneck features (DBF) extracted from a deep neural network (DNN) containing a narrow bottleneck layer, have been applied for language identification (LID), and yield significant performance improvement over state-of-the-art methods on NIST LRE 2009. However, the DNN is trained using a large corpus of specific language which is not directly related to the LID task. More recently, lattice based discriminative training methods for extracting more targeted DBF were proposed for ASR. Inspired by this, this paper proposes to tune the post-trained DNN parameters using an LID-specific training corpus, which may make the resulting DBF, termed a Discriminative DBF (D2BF), more discriminative and task-aware. Specifically, the maximum mutual information (MMI) criterion, with gradient descent, is applied to update the DNN parameters of the bottleneck layer in an iterative fashion. We evaluate the performance of the proposed D2BF using different back-end models, including GMM-MMI and ivector, over the most confused 6-languages selected from NIST LRE 2009. The results show that the proposed D2BF is more appropriate and effective than the original DBF.


doi: 10.21437/Interspeech.2014-604

Cite as: Jiang, B., Song, Y., Wei, S., McLoughlin, I.V., Dai, L.-R. (2014) Task-aware deep bottleneck features for spoken language identification. Proc. Interspeech 2014, 3012-3016, doi: 10.21437/Interspeech.2014-604

@inproceedings{jiang14b_interspeech,
  author={Bing Jiang and Yan Song and Si Wei and Ian Vince McLoughlin and Li-Rong Dai},
  title={{Task-aware deep bottleneck features for spoken language identification}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={3012--3016},
  doi={10.21437/Interspeech.2014-604},
  issn={2308-457X}
}