Natural languages have layered structures, the meaning and the surface word sequence. The mapping from the former to the latter is often one-to-many. This dramatically increases the sparsity when directly modelling the surface word sequence, for example, using n-gram language models (LM). To handle this issue, this paper presents a novel form of language model, paraphrastic LMs. A phrase level transduction model statistically learnt from standard text data is used to generate paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Significant error rate reductions of 0.5%-0.6% absolute were obtained on a state-of-the-art conversational telephone speech recognition task using a paraphrastic multi-level LM modelling both word and phrase sequences.
Index Terms: language model, paraphrase, speech recognition