ISCA Archive Blizzard 2018
ISCA Archive Blizzard 2018

The USTC System for Blizzard Challenge 2018

Yuan Jiang, Xiao Zhou, Chuang Ding, Ya-jun Hu, Zhen-Hua Ling, Li-Rong Dai

This paper introduces the USTC speech synthesis system for Blizzard Challenge 2018. The task is to build a speech synthesis system on a 6.5-hour children’s audio book corpus. The submitted system followed our previous one proposed in Blizzard Challenge 2017. A hidden Markov model (HMM)-based unit selection system was built with improvements in both the front-end text processing and back-end acoustic modeling. In the front-end, long short term memory(LSTM)-based recurrent neural networks(RNN) were adopted for tone and breaking indices (ToBI) prediction. In the back-end, two models were built for unit selection, a LSTM-RNN based acoustic model was built and the hidden layer was adopted as context embedding feature, a DNN based unit embedding model was built and the unit vector was adopted as phone unit feature. Evaluation results demonstrated that our system performed good on all aspects of paragraph test, which proved the effectiveness of our proposed system.