ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Tied-state mixture language model for WFST-based speech recognition

Hitoshi Yamamoto, Paul R. Dixon, Shigeki Matsuda, Chiori Hori, Hideki Kashioka

This paper describes a language model combination method for automatic speech recognition (ASR) systems based on Weighted Finite-State Transducers (WFSTs). The performance of ASR in real applications often degrades when an input utterance is out of the domain of the prepared language models. To cover a wide range of domains, it is possible to utilize a combination of multiple language models. To do this, we propose a language model combination method with a two-step approach; it first uses a union operation to incorporate all components into a single transducer and then merges states of the transducer to mix n-grams included in multiple models and to retain unique n-grams in each model simultaneously. The method has been evaluated in speech recognition experiments on travel conversation tasks and has demonstrated improvements in recognition performance.

Index Terms: Language model combination, WFST