ISCA Archive ISCSLP 2002
ISCA Archive ISCSLP 2002

Modeling duration and intonation in Mandarin Chinese synthesis with a neural network

Hongwei Ding, Oliver Jokisch, Hans Kruschke

The prosody control plays an important role in the naturalness of synthesized speech. In previous work, great efforts have been made to generate rule-based or parameter-based prosodic models. In order to capture the complex interaction of different relevant prosodic factors, neural networks were recently employed. This paper presents a new method of learning and modeling duration and intonation in Mandarin Chinese synthesis with a neural network, which was proved to be an appropriate approach in our Mandarin synthesis system. The material for the study of prosodic components was extracted from a phonetically and prosodically labeled sentence database uttered by the same speaker as for the synthesis inventory. This paper reports the study of duration and intonation, the analysis of the database, the concept of neural network model and the evaluation of training results.