ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Combining lexical and morphological knowledge in language model for inflectional (czech) language

Jan Nouza, Jindra Drabkova

In this paper we study several possibilities to enhance language modeling in case of inflectional languages, namely Czech. We show that some existing smoothing techniques can be further improved to cope with extremely sparse data. We propose several concepts to combine word-based and class-based language models. In our approach the classes are defined with respect to morphological categories and their syntactic relations are evaluated through bigrams. In speech recognition experiments the combination of word bigrams with class statistics helped to get a moderate performance improvement.