ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Weight optimization for bimodal unit-selection talking head synthesis

Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte

This paper addresses talking head synthesis based on the concatenation of units comprising of both acoustic and visual information. Selection of appropriate diphone units to synthesize a given text string is based on the minimization of a weighted linear combination of four costs that reflect linguistic, acoustic, and visual considerations. We present initial work toward a method to determine automatically the weights applied to each cost, using a series of metrics that assess quantitatively the performance of synthesis.