ISCA Archive ISCSLP 2006
ISCA Archive ISCSLP 2006

Multi-lingual TTS Speech Corpus Development

Yiqing Zu, Zhenhai Cao, Guilin Chen, Kesong Han, Peng Lu, Runqiang Yan, Kaizhi Wang, Zhenli Yu, Dongjian Yue, Aijun Li, Zhigang Yin

This paper presents approach of multi-lingual speech corpus design, data collection and phonetic annotation for text-to-speech (TTS) system development. Under a uniform data structure, more than 10 languages and dialects speech corpora are shared with language independent data management approaches and data processing procedures. A specifically defined super phonetic symbol set are used for all languages and related dialects. The defined data management methods enable Motorola multi-lingual TTS systems employs a uniform architecture for cost function-based unit selection strategy and speech synthesizer modules on both sever-based and embedded platforms. Keywords: Multi-lingual, TTS, speech corpus.