In order to tele-operate the lip motion of a humanoid robot (such as android) from the utterances of the operator, we developed a speech-driven lip motion generation method. The proposed method is based on the rotation of the vowel space, given by the first and second formants, around the center vowel, and a mapping to the lip opening degrees. The method requires the calibration of only one parameter for speaker normalization, so that no other training of models is required. In a pilot experiment, the proposed audio-based method was perceived as more natural than vision-based approaches, regardless of the language.
Index Terms. lip motion, formant, humanoid robot, teleoperation, synchronization