ISCA Archive ISCSLP 2008
ISCA Archive ISCSLP 2008

Multi-Layer F0 Modeling For HMM-Based Speech Synthesis

Cheng-Cheng Wang, Zhen-Hua Ling, Bu-Fan Zhang, Li-Rong Dai

This paper proposes a two-layer fundamental frequency (F0) modeling method for HMM-based parametric speech synthesis. The F0 models are trained for each contextdependent phoneme in the conventional HMM-based speech synthesis system. Considering the super-segmental characteristics of F0 features, an explicit syllable-layer F0 model is introduced in this paper. At synthesis stage, the F0 contour is generated by maximizing the combined likelihood functions of the phone-layer and syllable-layer F0 models. The objective and subjective evaluation results in our experiments show that the proposed multi-layer F0 modeling method can improve the performance of F0 prediction for emotional speech synthesis. Index Terms—Speech synthesis, hidden Markov model, fundamental frequency modeling