In this paper we describe a specialized Weighted Finite State Transducer (WFST) framework for handling class language models and dynamic vocabulary in automatic speech recognition. The proposed framework has several important features. A fused composition algorithm that substantially reduces the memory usage in comparison to generic WFST operations, and an efficient dynamic vocabulary scheme that allows for arbitrary new words to be added to class based language models on-the-fly without requiring any changes to the pre-compiled transducers. The dynamic vocabulary approach achieves very low run-time costs by representing the dynamic vocabulary items inserted into the language model from an optimum set of existing lexicon items. Experimental results on a voice search task illustrate the low runtime costs of the proposed approach.
Index Terms: speech recognition, decoding, WFST