In this paper, we motivate and present a sketch of an extrinsic timing model of speech production. It is a three-stage model, involving 1) a phonological planning stage, where symbolic segmental representations are sequenced and slotted into an appropriate prosodic structure, and where appropriate acoustic cues are selected for each segment in its context, and 2) a phonetic planning stage, where cues are mapped onto sets of articulators and appropriate values for spatial and temporal parameters of movement are computed, and 3) a motor-sensory implementation stage, where articulator movements are generated and tracked. We cite model components from the literature that accomplish many of the functions this type of model requires.