ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system

Hoeun Song, Jaein Kim, Kyongrok Lee, Jinyoung Kim

Recently, corpus-based text-to-speech (CB-TTS) has been actively studied through the world. Statistical training methods are generally ap- plied for prosodic rules in CB-TTS, and classification and regression tree (CART) is one of the mostly used methods. In this paper, we present an efficient CART training approach of z-score based phonetic normalization. The idea of ours comes from the fact that the most important three parameters of CART training for segmental prosody are phone and its right and left phones, especially in Korean language. Our approach reduces the number of CART terminal nodes effectively. The reduction ratios are approximately 14-94% for estimation of segmental duration and 45-70% for intensity estimation. Also, the experimental results show that phonetic normalization slightly lessens the estimation errors.