ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Bag-of-words input for long history representation in neural network-based language models for speech recognition

Kazuki Irie, Ralf Schlüter, Hermann Ney

In most of previous works on neural network based language models (NNLMs), the words are represented as 1-of-N encoded feature vectors. In this paper we investigate an alternative encoding of the word history, known as bag-of-words (BOW) representation of a word sequence, and use it as an additional input feature to the NNLM. Both the feedforward neural network (FFNN) and the long short-term memory recurrent neural network (LSTM-RNN) language models (LMs) with additional BOW input are evaluated on an English large vocabulary automatic speech recognition (ASR) task. We show that the BOW features significantly improve both the perplexity (PP) and the word error rate (WER) of a standard FFNN LM. In contrast, the LSTM-RNN LM does not benefit from such an explicit long context feature. Therefore the performance gap between feedforward and recurrent architectures for language modeling is reduced. In addition, we revisit the cache based LM, a seeming analog of the BOW for the count based LM, which was unsuccessful for ASR in the past. Although the cache is able to improve the perplexity, we only observe a very small reduction in WER.