ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Neural network language models for low resource languages

Ankur Gandhe, Florian Metze, Ian Lane

For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition, out performing standard n-gram language models (LMs). For low resource languages, however, the performance of NNLMs has not been well explored. In this paper, we evaluate the effectiveness of NNLMs for low resource languages and show that NNLMs learn better word probabilities than state-of-the-art n-gram models even when the amount of training data is severely limited. We show that interpolated NNLMs obtain a lower WER than standard n-gram models, no mater the amount of training data. Additionally, we observe that with small amounts of data (approx. 100k training tokens), feed-forward NNLMs obtain lower perplexity than recurrent NNLMs, while for the larger data condition (500k–1M training tokens), recurrent NNLMs can obtain lower perplexity than feed-forward models.

doi: 10.21437/Interspeech.2014-560

Cite as: Gandhe, A., Metze, F., Lane, I. (2014) Neural network language models for low resource languages. Proc. Interspeech 2014, 2615-2619, doi: 10.21437/Interspeech.2014-560

  author={Ankur Gandhe and Florian Metze and Ian Lane},
  title={{Neural network language models for low resource languages}},
  booktitle={Proc. Interspeech 2014},