This paper proposes a new stochastic language model for speech recognition based on particle N-grams and content-word N-grams. The conventional word N-gram model is considered as effective for speech recognition; however, it represents only local constraints between successive words and lacks the ability to describe global syntactic or semantic relationships between words. In the proposed method the language model gives the N-gram probability of the word sequences, with attention given only to particles or to content words to represent more global constraints. As an application of this model to speech recognition, a post-processor was constructed to select the optimum sentence candidate from a phrase lattice obtained by a phrase recognition system. The proposed method out-performed a CFG-based method in recognition accuracy, which demonstrates its effectiveness in improving speech recognition performance.
Keywords: speech recognition, stochastic language model, particles, content words