ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Variable-length acoustic units inference for text-to-speech synthesis

Olivier Boeffard

The best voices in text-to-speech synthesis are currently obtained via acoustic units concatenation-based systems. In such systems, the choice of units whose concatenations will produce an acoustic message is a crucial stage. Moreover, it can be observed that current TTS systems use acoustic units which most often correspond to variable-length phonetic descriptions. In this article, an original framework is proposed which allows the automatic determination of an optimum set of variable-length acoustic units.