ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Perceptual quality dimensions of text-to-speech systems

Florian Hinterleitner, Sebastian Möller, Christoph Norrenbrock, Ulrich Heute

The aim of this paper is to analyze the perceptual quality dimensions of state-of-the-art text-to-speech systems (TTS). Therefore, several pretests were conducted to determine a suitable set of attribute scales. The resulting 16 scales were used in a semantic differential on a diverse database containing 16 different TTS systems. A subsequent multidimensional analysis (Principal Axis Factor analysis with Promax rotation) resulted in three underlying quality dimensions. They were labeled naturalness, disturbances, and temporal distortions. A mapping of these factors onto the perceived overall quality revealed that naturalness contributes the most to the quality of TTS signals.