Today synthetic speech is often based on concatenation of natural speech, i.e. units such as diphones or polyphones are taken from natural speech and are then put together to form any word or sentence. So far there have mainly been two ways of adding a visual modality to such a synthesis: Morphing between single images or concatenating video sequences. In this study, however, a new method is presented where recorded natural movements of points on the face are used to control an animated face.