ISCA Archive ISCSLP 2006
ISCA Archive ISCSLP 2006

A Unified Totally-Data-Driven Framework for Duration and Intonation Modeling

Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao

This paper proposes a unified framework for duration and intonation modeling in Mandarin TTS. In this framework, we design a novel parametric representation of mandarin intonation based on orthogonal polynomial approximation. By this representation, we can decompose F0 vector into 3 orthogonal polynomial parameters that are continuous scalars. Based on this vector-to-scalar decomposition, we can predict both duration and F0 representation parameters from linguistic and phonetic attributes by generalized linear models (GLM) in a unified manner. The model coefficients in GLM can be trained in a data-driven manner. Furthermore, the model structure, i.e., the significant attributes or attribute interactions in GLM, can be automatically optimized in a data-driven manner as well, rather than intuitively decided. So the proposed framework is totally-data-driven. In objective evaluation experiments, the new approach shows comparable or higher prediction performance compared with the other excellent approaches. Informal subjective perceptual experiments show that the predicted duration and intonation are quite appropriate and natural. Keywords: duration modeling, intonation modeling, F0 contour parametric representation, generalized linear model, speech synthesis, stepwise regression.