ISCA Archive Blizzard 2018
ISCA Archive Blizzard 2018

The CSTR entry to the 2018 Blizzard Challenge

Felipe Espic, Avashna Govender, Manuel Sam Ribeiro, Cassia Valentini-Botinhao, Oliver Watts

Similar to 2016 and 2017 Blizzard Challenge, the task for this year is to train on expressively-read children’s story-books, and to synthesise speech in the same domain. This give us an opportunity to investigate the effectiveness of several techniques we have developed when applied to expressive and prosodically-varied audiobook data. This paper describes the text-to-speech system entered by The Centre for Speech Technology Research into the 2018 Blizzard Challenge. The system is a hybrid synthesis system where a halfphone unit selection synthesiser is driven by the output of a neural network based acoustic and duration model. We adopt the same neural network based models used in our last year entry with a different unit selection component. We discuss the performance of our system by reporting the results from formal listening tests provided by the challenge.