ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Robust language identification using convolutional neural network features

Sriram Ganapathy, Kyu Han, Samuel Thomas, Mohamed Omar, Maarten Van Segbroeck, Shrikanth S. Narayanan

The language identification (LID) task in the Robust Automatic Transcription of Speech (RATS) program is challenging due to the noisy nature of the audio data collected over highly degraded radio communication channels as well as the use of short duration speech segments for testing. In this paper, we report the recent advances made in the RATS LID task by using bottleneck features from a convolutional neural network (CNN). The CNN, which is trained with labelled data from one of target languages, generates bottleneck features which are used in a Gaussian mixture model (GMM)-ivector LID system. The CNN bottleneck features provide substantial complimentary information to the conventional acoustic features even on languages not seen in its training. Using these bottleneck features in conjunction with acoustic features, we obtain significant improvements (average relative improvements of 25% in terms of equal error rate (EER) compared to the corresponding acoustic system) for the LID task. Furthermore, these improvements are consistent for various choices of acoustic features as well as speech segment durations.

doi: 10.21437/Interspeech.2014-419

Cite as: Ganapathy, S., Han, K., Thomas, S., Omar, M., Segbroeck, M.V., Narayanan, S.S. (2014) Robust language identification using convolutional neural network features. Proc. Interspeech 2014, 1846-1850, doi: 10.21437/Interspeech.2014-419

  author={Sriram Ganapathy and Kyu Han and Samuel Thomas and Mohamed Omar and Maarten Van Segbroeck and Shrikanth S. Narayanan},
  title={{Robust language identification using convolutional neural network features}},
  booktitle={Proc. Interspeech 2014},