ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Combining multiple-type input units using recurrent neural network for LVCSR language modeling

Vataya Chunwijitra, Ananlada Chotimongkol, Chai Wutiwiwatchai

In this paper, we investigate the use of a Recurrent Neural Network (RNN) in combining hybrid input types, namely word and pseudo-morpheme (PM) for Thai LVCSR language modeling. Similar to other neural network frameworks, there is no restriction on RNN input types. To exploit this advantage, the input vector of a proposed hybrid RNN language model (RNNLM) is a concatenated vector of word and PM vectors. After the first-pass decoding with an n-gram LM, a word-based lattice is expanded to include the corresponding PMs of each word. The hybrid RNNLM is then used to re-score the hybrid lattice in the second-pass decoding. We tested our hybrid RNNLM on two recognition tasks: broadcast news transcription and mobile speech-to-speech translation. The proposed model achieved better recognition performance than a baseline word-based RNNLM as hybrid input types provide more flexible unit choices for language model re-scoring. The computational complexity of a full-hybrid RNNLM can be reduced by limiting the input vector to include only frequent words and PMs. In a reduced-hybrid RNNLM, the size of the input vector can be reduced by half which can considerably save both training and decoding time without affecting recognition accuracy.