In this paper, we apply the leaving-one-out concept to the estimation of 'small' probabilities, i.e. the case where the number of training samples is much smaller than the number of possible classes. After deriving the Turing-Good formula in this framework, we introduce several specific models in order to avoid the problems of the original Turing-Good formula. These models are the constrained model, the absolute discounting model and the linear discounting model. These models are then applied to the problem of bigram-based stochastic language modelling. Experimental results are presented for an English corpus of 1.1 million words.
Keywords: Stochastic Language Modelling, Leaving-One- Out, Turing-Good Method, Insufficient Training Data