In this paper, adaptation of language models using the minimum discrimination information criteria is presented. Language model probabilities are adapted based on unigram, bigram and trigram features using a modified version of the generalized iterative scaling algorithm. Furthermore, a lan-guage model compression algorithm, based on conditional relative entropy is discussed. It removes probability terms from the language model, which can be closely approximated by back-off distributions. The proposed algorithms are used to adapt a mismatched, newspaper style language model to a natural language call routing task. The experiments show a significant reduction in perplexity and word error rate for small amounts of adaptation data.