Current techniques for training representations of the articulatory-acoustic mapping from data rely on artificial simulations to provide codebooks of articulatory and acoustic measurements, which are then modelled by simple functional approximations. This paper outlines a stochastic framework for adapting an artificial model to real speech from acoustic measurements alone, using the EM algorithm. It is shown that parameter and state estimation problems for articulatory-acoustic inversion can be solved by adopting a statistical approach based on non-linear filtering.