We have developed a statistical model of speech that incorporates certain temporal properties of human speech perception. The primary goal of this work is to avoid a number of current constraining assumptions for statistical speech recognition systems, particularly the model of speech as a sequence of stationary segments consisting of uncorrelated acoustic vectors. A focus on perceptual models may in principle allow for statistical modeling of speech components that are more relevant for discrimination between candidate utterances during speech recognition. In particular, we hope to develop systems that have some of the robust properties of human audition for speech collected under adverse conditions. The outline of this new research direction is given here, along with some preliminary theoretical work.