In this talk, we present a simple empirical law that vastly outperforms the Akaike and Bayesian Information Criterions at predicting the test set likelihood of an exponential language model. We discuss under what conditions this relationship holds; how it can be used to improve the design of language models; and whether these ideas can be applied to other types of statistical models as well. Specifically, we show how this relationship led to the design of "Model M", a class-based language model that outperforms all previous models of this type.