A theoretical and experimental analysis of a simple multilevel SegmentalHMMis presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate (articulatory) layer, where speech dynamics are modeled using linear trajectories. Three formant-based parameterizations and measured articulatory positions are considered as intermediate representations, from the TIMIT and MOCHA corpora respectively. The articulatory-to-acoustic mapping was performed by between 1 and 49 linear transformations. Results of phone-classi- fication experiments demonstrate that, by appropriate choice of intermediate parameterization and mappings, it is possible to achieve close to optimal performance.