ISCA Archive Blizzard 2021
ISCA Archive Blizzard 2021

The SRCB-SL system for Blizzard Challenge 2021

Chunhui Lu, Xue Wen, Ruolan Liu, Xiaoyan Lou, Liming Song, June Sig Sung, Gunu Jho, Hyoungmin Park

This paper presents the SRCB-SL text-to-speech system that participated in Blizzard Challenge 2021. This year’s Challenge was in European Spanish and had come with 5 hours of clean speech data from a female native speaker. It included two tasks: a hub task that asked the participant to build a voice from the provided data and synthesize all-Spanish speech, and a spoke task in which the target speech contained a few English words. Our system featured a text analysis - acoustic model - vocoder pipeline. The text analyzer combined several old and new function modules to convert input text to a sequence of Spanish phonemes with prosodic boundary (break) markers. English phonemes were mapped to their Spanish counterparts in spoke task. The acoustic model was built around FastSpeech, and converted the phoneme sequences from text analysis to mel-spectrograms. For vocoder we used HiFi-GAN, which we trained on Challenge data and fine-tuned using predicted mel-spectrogram as input. This same system was used for both tasks. Challenge results showed that our system (identified as K) worked well by most of the criteria, which validated the effectiveness of our method.

doi: 10.21437/Blizzard.2021-10

Cite as: Lu, C., Wen, X., Liu, R., Lou, X., Song, L., Sung, J.S., Jho, G., Park, H. (2021) The SRCB-SL system for Blizzard Challenge 2021. Proc. The Blizzard Challenge 2021, 59-63, doi: 10.21437/Blizzard.2021-10

  author={Chunhui Lu and Xue Wen and Ruolan Liu and Xiaoyan Lou and Liming Song and June Sig Sung and Gunu Jho and Hyoungmin Park},
  title={{The SRCB-SL system for Blizzard Challenge 2021}},
  booktitle={Proc. The Blizzard Challenge 2021},