ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Totally data-driven duration modeling based on generalized linear model for Mandarin TTS

Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao

This paper proposes a totally data-driven duration modeling method for Mandarin TTS, which uses Generalized Linear Models (GLM) to model duration and stepwise regression to automatically select the attribute set with statistical measurements. This method can get a good tradeoff between model complexity and goodness of fit. Besides, speaking rate is introduced as a new modeling attribute, which not only achieves higher performance but also provides a novel approach to adjust speaking rate when synthesizing. We also propose to use R2 to fairly evaluate the modeling performances on different databases, since R2 refers to the fraction of corresponding variance explained by a model. Experiments show the performance of GLM is significantly higher than that of CART. With our much smaller models and corpus, the proposed method also achieves comparable results reported by other excellent researches.