ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets

Vincent Pollet, Enrico Zovato, Sufian Irhimeh, Pier Batzu

Bidirectional recurrent neural nets have demonstrated state-of-the-art performance for parametric speech synthesis. In this paper, we introduce a top-down application of recurrent neural net models to unit-selection synthesis. A hierarchical cascaded network graph predicts context phone duration, speech unit encoding and frame-level logF0 information that serves as targets for the search of units. The new approach is compared with an existing state-of-art hybrid system that uses Hidden Markov Models as basis for the statistical unit search.