ISCA Archive IDS 1999
ISCA Archive IDS 1999

New language model adaptation algorithm based on the definition ofcardinal distance

David Janiszek, Renato de Mori, Frédéric Bechet, Driss Matrouf, Chafik Mokbel

Linear transformations are proposed for transforming vectors of Language Model (LM) probabilities. A separate vector is considered for each word and the j-th element of a vector is the probability of observing the word in the context of its j-th history. If a good general LM is available, it is possible to cluster vectors into classes and to infer a transformation for each class. Probability distributions of words which are not observed or which are observed with a low frequency in the adaptation corpus can be obtained by transforming the distribution they have in the general model using the transformation of the cluster they belong to. Experimental results show that there is a interesting range in the size of the adaptation corpus in which perplexity of the adapted LM is lower than the perplexity of the LM whose probabilities are directly estimated from the adaptation data.