The SCUT Text-To-Speech System for the Blizzard Challenge 2021

Weiheng Liu, Yitao Yang, Jiangwei Li, Jinghui Zhong

In this paper, we present our solution for the Blizzard Challenge 2021 Spoke task, which is to build a code-switched speech synthesis system for European Spanish and English with only Spanish dataset. The major challenges of code-switched text are language-independent representation of linguistic information and cross-language speaker transfer. For these difficulties, a set of phonological embedding derived from the International Phonetic Alphabet(IPA) is applied to uniformly identify bilingual texts and facilitate knowledge sharing among multiple languages. Meanwhile, our system uses predefined speaker embedding to control the voice of the generated speech. In addition, we introduced a variational autoencoder to extract hidden features in speech in order to balance the data differences between multiple datasets. The results of the evaluation have demonstrated the effectiveness of our method in code-switched speech synthesis.

