ISCA Archive ISCSLP 2008
ISCA Archive ISCSLP 2008

HMM-Based Mixed-Language (Mandarin-English) Speech Synthesis

Yao Qian, Hou-Wei Cao, Frank K. Soong

utterances have become more common among bilingually educated people like college students in China. Similarly, it becomes highly desirable that TTS systems can synthesize mixedlanguage speech properly. Recently, we proposed an HMM-based bilingual TTS to synthesize a target language when only monolingual source language recording from a speaker is available. In this paper, we extend it to synthesize mixedlanguage sentences. A cross-language state mapping is first established between decision trees built from the English and Mandarin recordings of a bilingual speaker. Via the mapping, English words or phrases embedded in Mandarin sentences can then be synthesized. The bilingual state-mapping is extended to monolingual speaker to perform mixed-language synthesis. Perceptual test results show: (1) decent intelligibility, confirmed by an English word transcription accuracy of 86%; (2) good speech quality with an average MOS score of 3.2. Keywords-Speech synthesis, HMM-based TTS, Mixed-language speech synthesis