ISCA Archive Blizzard 2021
ISCA Archive Blizzard 2021

The CSTR entry to the Blizzard Challenge 2021

Dan Wells, Pilar Oplustil-Gallegos, Simon King

We describe the text-to-speech (TTS) system submitted from The Centre for Speech Technology Research at the University of Edinburgh to the Blizzard Challenge 2021. We participated in the spoke task to build a voice for Peninsular Spanish, where test utterances contain a small number of English words. Our system is trained from monolingual data in Spanish and English, including some Spanish-accented English and Spanish utterances containing English words, but without explicit supervision for these aspects. Input texts are represented using phonological feature vectors to encourage parameter sharing between the two languages despite different phoneme inventories. When synthesizing test utterances, we perform automatic language identification to provide word-level language embeddings and apply pronunciation nativization rules to any detected English words to bring them closer to native Spanish phonology. In addition to the results of the main Blizzard Challenge evaluation, we present analysis of the impact of nativization strategy on listener preferences, which may be relevant for evaluation of code-switching TTS in general.