The goal of this work is to use phonetic recognition to drive a synthetic image with speech. Phonetic units are identified by the phonetic recognition engine and mapped to mouth gestures, known as visemes, the visual counter-part of phonemes. The acoustic waveform and visemes are then sent to a synthetic image player, called FaceMe! where they are rendered synchronously. This paper provides background for the core technologies involved in this process and describes asynchronous and synchronous prototypes of a combined phonetic recognition/FaceMe! system which we use to render mouth gestures on an animated face.