We present a method for specifying acoustic targets for articulatory synthesis. The controller is driven by an "acoustic" score which specifies the skeleton of the desired audio-visual properties of the output. We examine here the possibility of specifying simple VV and CV transitions.