ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Learning syllable duration and intonation of Mandarin Chinese

Oliver Jokisch, Hongwei Ding, Hans Kruschke, Guntram Strecha

The perceived quality of synthetic speech strongly depends on its prosodic naturalness. The current paper presents a neural network based prosody model of Mandarin Chinese. Using a small but especially designed syllable database and an enhanced linguistic feature set, the novel approach enables the training of syllable duration, syllable-based F0 model points and is suitable for the multilingual prosody control in concatenative speech synthesis. The paper describes database design, neural network model, training results for Chinese and the perceptual evaluation. The results indicate the importance of the appropriate database design and the enhanced linguistic feature set. Perceptual tests of resynthesized stimuli using predicted duration values receive MOS comparable to natural speech of about 4.8.