ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Simplification of TTS architecture vs. operational quality

Eric Keller

Many applications in mobile telephony and portable computing require high-quality speech synthesis systems with a very modest computational footprint. Our text-to-speech system for French gives satisfactory performance in phonetisation and prosody with considerably reduced computational resources. Using the Mons (Belgium) diphone data base, the program's current version runs in real time on Pentium-type PCs or Mac PPCs. The code requires 442 k, minimum RAM requirement is 4700 k, the minimum disk requirement is 5560 k. The phonetisation and prosody processing has been brought to a first level of optimal compromise between quality and computational footprint. Major further reductions in space requirements would probably necessitate a re-evaluation of sound generation procedures.