ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Reducing the footprint of the IBM trainable speech synthesis system

Dan Chazan, Ron Hoory, Zvi Kons, Dorel Silberstein, Alexander Sorin

This paper presents a novel approach for concatenative speech synthesis. This approach enables reduction of the dataset size of a concatenative text-to-speech system, namely the IBM trainable speech synthesis system, by more than an order of magnitude. A spectral acoustic feature based speech representation is used for computing a cost function during segment selection as well as for speech generation. Initial results indicate that even with a dataset size of a few megabytes it is possible to achieve quality which is significantly higher than existing small footprint formant based synthesizers.