ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Design of an optimal continuous speech database for text-to-speech synthesis considered as a set covering problem

Helene Francois, Olivier Boeffard

Text-to-speech synthesis can be carried out by concatenation of acoustic units obtained from a continuous speech database. This paper presents the optimization of such as database according to phonetic criteria. A large corpus of texts is assembled (311 572 sentences), phonetized automatically and condensed (12 217 sentences) to retain only 10 tokens of the most frequent triphonemes. This is a NP-hard problem of set covering. It has been solved in an approximate way using a greedy algorithm. The condensed database covers 25% of the initial distinct triphonemes, each being represented by 10 tokens at least, which allows 95% of the triphoneme tokens of the initial corpus to be covered. The distribution of the triphonemes remains proportional to their initial statistical appearance.