ISCA Archive SSW 2007
ISCA Archive SSW 2007

Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV

Junichi Yamagishi, Takao Kobayashi, Steve Renals, Simon King, Heiga Zen, Tomoki Toda, Keiichi Tokuda

For constructing a speech synthesis system which can achieve diverse voices, we have been developing a speaker independent approach of HMM-based speech synthesis in which statistical average voice models are adapted to a target speaker using a small amount of speech data. In this paper, we incorporate a high-quality speech vocoding method STRAIGHT and a parameter generation algorithm with global variance into the system for improving quality of synthetic speech. Furthermore, we introduce a feature-space speaker adaptive training algorithm and a gender mixed modeling technique for conducting further normalization of the average voice model. We build an English text-to-speech system using these techniques and show the performance of the system.