In this paper, we extend the absolute discounting technique along various directions. To estimate the backing-off distribution, we use ra-gram singletons, i.e. ra-grams that were seen exactly once in the training data. This method is applied in addition to the usual estimation of discounting parameters. The improvement in perplexity is typically between 8% and 12%. We also investigate a cache model. In experimental tests on a large text corpus, the cache model improved the perplexity by up to 28%. The experimental evaluations were carried out on a set of 38 million words from the Wall Street Journal task. We compare our results with the results reported by CMU.