ISCA Archive Blizzard 2021
ISCA Archive Blizzard 2021

The IOA-ThinkIT system for Blizzard Challenge 2021

Zengqiang Shang, Ziyi Chen, Haozhe Zhang, Pengyuan Zhang

In this paper, we introduce the bilingual text-to-speech system from IOA-ThinkIT to Blizzard Challenge 2021. This year’s challenge aims to build a Spanish speech synthesis system, which also supports Spanish-English code-switch synthesis. We model the pronunciation, style and duration separately. For style modeling, Our approach adopts an analysis-synthesis scheme. At the analysis, a phoneme-level style encoder is utilized to extracted speaker-independent style vectors. Then an RNN auto-regressive predictor was built for style prediction at inference. We implement adversarial speaker training to text encoder of backbone and duration predictor to enable cross-language timbre transfer and cross-language duration transfer. Evaluation results provided by the challenge organizers are conducted over intelligibility, naturalness and similarity.

doi: 10.21437/Blizzard.2021-3

Cite as: Shang, Z., Chen, Z., Zhang, H., Zhang, P. (2021) The IOA-ThinkIT system for Blizzard Challenge 2021. Proc. The Blizzard Challenge 2021, 20-24, doi: 10.21437/Blizzard.2021-3

  author={Zengqiang Shang and Ziyi Chen and Haozhe Zhang and Pengyuan Zhang},
  title={{The IOA-ThinkIT system for Blizzard Challenge 2021}},
  booktitle={Proc. The Blizzard Challenge 2021},