ISCA Archive AVSP 2010
ISCA Archive AVSP 2010

In pursuit of visemes

Sarah Hilder, Barry-John Theobald, Richard Harvey

We describe preliminary work towards an objective method for identifying visemes. Active appearance model (AAM) features are used to parameterise a speaker’s lips and jaw during speech. The temporal behaviour of AAM features between automatically identified salient points is used to represent visual speech gestures, and visemes are created by clustering these gestures using dynamic time warping (DTW) as a costfunction. This method produces a significantly more structured model of visual speech than if a typical phoneme-to-viseme mapping is assumed.

Index Terms: Visemes, visual speech encoding