A stochastic framework for articulatory speech recognition is presented. Utterances are described in terms of overlapping phonological units built into a Markov chain, where each state is identified with a set of acoustic/articulatory correlates represented by a target distribution on an articulatory space. Articulator motion is modelled by a Markov-modulated stochastic linear dynamical system, and observations of the articulatory state are generated in an acoustic space through a non-linear mapping. Procedures for state and parameter estimation are outlined based on the EM algorithm and extended Kalman filtering techniques, and illustrated using artificial data.