This work introduces a new maximum entropy language model that decomposes the model parameters into a low rank component that learns regularities in the training data and a sparse component that learns exceptions (e.g. keywords). The low rank solution corresponds to a continuous-space language model. This model generalizes the standard l1-regularized maximum entropy model, and has an efficient accelerated first-order training algorithm. In conversational speech language modeling experiments, we see perplexity reductions of 2-5%.
Index Terms: language modeling, maximum entropy, sparse plus low rank decomposition