We explore a simple model of speech articulation. The model consists of an articulator combined with the ability to remember and improve the neural drive signal for the articulator. Over many productions, the system learns a neural drive signal that provides an accurate match for acoustically-defined targets. In fact, the match can be better than expected, yielding narrower regions of coarticulation than the intrinsic muscle response time. Further, despite the time delay introduced by the muscle, the articulatory response has no time delay, because the learned neural drive signal occurs in advance of changes in the acoustic targets. Finally, we test the model against tonal production data from Mandarin conversation, and show that it can represent non-trivial surface intonation patterns with simple and linguistically reasonable targets.