ISCA Archive Blizzard 2010
ISCA Archive Blizzard 2010

The WISTON Text to Speech System for Blizzard Challenge 2010

Jianhua Tao, Shifeng Pan, Ya Li, Zhengqi Wen, Yang Wang

The paper introduces the speech synthesis system developed by Institute of Automation, Chinese Academy of Sciences (CASIA) for Blizzard Challenge 2010. The large corpus based speech synthesis system, WISTON, was built to synthesize Mandarin speech. In this year, a new prosodic structure prediction model was used, which is more precise and compact than before. Furthermore, two kinds of syllable segmentation methods, i.e. rough segmentation and precise segmentation, were performed on Mandarin speech corpus. The rough segmentation labels were used in prosody models training and unit selection stage. During concatenation stage, these two kinds of segmentation labels are both used to determine the start position and end position of waveform fragment of each unit. Experiment results show that this approach is effective. The evaluation results show that except the similarity is very high, mean opinion score (MOS) and word error rate (WER) of WISOTN system are of average level.