ISCA Archive SpeechProsody 2008
ISCA Archive SpeechProsody 2008

Improved prediction of tone components for F0 contour generation of Mandarin speech based on the tone nucleus model

Qinghua Sun, Keikichi Hirose, Nobuaki Minematsu

Improved prediction of tone components was realized in our method for synthesizing sentence fundamental frequency (F0) contours of Mandarin speech. The method is based on representing a sentence logarithmic F0 contour as a superposition of tone components on phrase components as in the case of generation process model (F0 model). The tone components are realized by concatenating their fragments at tone nuclei predicted by a corpus-based method, while the phrase components are generated by rules under the F0 model framework. In the original method, tone components are assumed to have similar shapes as F0 contours at tone nuclei. This is based on the assumption that the phrase components are almost flat throughout an utterance. However, this is not the case especially for phrase component initials. To cope with this problem, parameters representing tone components of tone nuclei are modified. Also, predicted parameters in earlier processes are used for the prediction of following processes. Result of the listening test conducted for synthetic speech with the generated F0 contours by our methods and also by the HMM-based method confirmed the advantage of ours, especially the improved version. 1. Introduction