ISCA Archive Blizzard 2021
ISCA Archive Blizzard 2021

The IOA-ThinkIT system for Blizzard Challenge 2021

Zengqiang Shang, Ziyi Chen, Haozhe Zhang, Pengyuan Zhang

In this paper, we introduce the bilingual text-to-speech system from IOA-ThinkIT to Blizzard Challenge 2021. This year’s challenge aims to build a Spanish speech synthesis system, which also supports Spanish-English code-switch synthesis. We model the pronunciation, style and duration separately. For style modeling, Our approach adopts an analysis-synthesis scheme. At the analysis, a phoneme-level style encoder is utilized to extracted speaker-independent style vectors. Then an RNN auto-regressive predictor was built for style prediction at inference. We implement adversarial speaker training to text encoder of backbone and duration predictor to enable cross-language timbre transfer and cross-language duration transfer. Evaluation results provided by the challenge organizers are conducted over intelligibility, naturalness and similarity.