ISCA Archive SpeechProsody 2004
ISCA Archive SpeechProsody 2004

F0 analysis and modeling for Cantonese text-to-speech

Yujia Li, Tan Lee, Yao Qian

This paper presents a study on the control of fundamental frequency (F0) in Cantonese text-to-speech (TTS) systems. The surface F0 contour of an utterance is considered as the combination of tone-related local components and phrase-level long-term variation. A novel method of F0 normalization has been developed to effectively separate them. Statistical analysis is performed for the phrase curves and the tone contours extracted from a large speech corpus, and the results are summarized into regular patterns. These patterns are used as the basic templates in a non-parametric F0 model, from which utterance-level F0 contours can be generated. Perceptual test shows the naturalness of speech naturalness is significantly improved by the new F0 model. The MOS increases by 0.65 over a five-point scale.