ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Improved corpus-based synthesis of fundamental frequency contours using generation process model

Keikichi Hirose, Masaya Eto, Nobuaki Minematsu

We have been developing corpus-based synthesis of fundamental frequency (F0) contours for Japanese text-to-speech (TTS) conversion systems. Since, in our method, the synthesis is done under the constraint of F0 contour generation process model, a rather good quality is still kept even if the prediction process is done incorrectly. Although it was already shown that the synthesized F0 contours sounded as highly natural as those using heuristic rules arranged by experts, there were occasional cases with low quality depending on sentences to be synthesized. Several features, including a code representing syntactic boundary depth obtainable through an automatic parsing process, were added to input parameters of the statistical methods, and a better prediction was realized. The boundary depth code was shown to be very effective for improving especially phrase component parameter prediction.