ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Quality analysis of macroprosodic F0 dynamics in text-to-speech signals

Christoph R. Norrenbrock, Florian Hinterleitner, Ulrich Heute, Sebastian Möller

We present a study on the relation between fundamental frequency (F0) and its perceptual effect in the context of text-to-speech (TTS) synthesis. Features that essentially capture the intonational (macro-prosodic) properties of spoken speech are introduced and analysed with regard to the following questions: (i) How does the prosodic variation of TTS signals differ from natural speech? (ii) Is there a functional relationship between the prosodic variation of TTS signals and its perceived quality? In answering these questions we present novel approaches for the construction of non-intrusive quality estimators. The results reveal a substantial degree of systematic influence of prosodic variation on TTS quality.

Index Terms: Speech quality, instrumental quality assessment, text-to-speech (TTS), prosody.