A new framework is proposed to construct large-span, semantically-derived language models for large vocabulary speech recognition. It is based on the latent semantic analysis paradigm, which seeks to automatically uncover the salient semantic relationships between words and documents in a given corpus. Because of its semantic nature. a latent semantic language model is well suited to complement a conventional. more syntactically-oriented n-gram. An integrative formulation is proposed for the combination of the two paradigms. The performance of the resulting integrated language model. as measured by perplexity, compares favorably with the corresponding n-gram performance.