In our paper we propose a general way of incorporating class-based language models with many-to-many word-to-class mapping into the finite-state transducer (FST) framework. Since class-based models alone usually do not improve the recognition accuracy, we also present a method for an efficient language model combination. An example of a word-to-class mapping based on morphological tags is also given. Several word-based and tag-based language models are tested in the task of transcribing Czech broadcast news. Results show that class-based models help to achieve a moderate improvement in recognition accuracy.