ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Speech intonation for TTS: study on evaluation methodology

Javier Latorre, Kayoko Yanagisawa, Vincent Wan, BalaKrishna Kolluru, Mark J. F. Gales

The standard evaluation of intonation models is by means of non-referenced subjective tests (pair or MOS) in which subjects rate the quality or compare different samples without any explicit reference. These tests are usually conducted on an isolated sentence basis. However, for a single sentence, with no contextual information, there are multiple valid intonations. A subject's preference over this range of intonation patterns may be highly personal. This paper investigates the degree to which this ambiguity in the appropriate intonation pattern impacts the assessments of prosody for speech synthesis systems. To examine this problem, the variance of the F0 pattern of several vocoded sentences was modified and subjects asked to compare multiple versions with different levels of modification in terms of preference/quality. Then, they were presented with the reference which defines the original intonation and asked about the similarity to that reference. The results show that subjects can identify the samples with no F0 variance modification when given a reference but they don't always prefer them. Thus, non-referenced tests with no context, though may help to analyse user acceptability, may not be appropriate to measure the performance of intonation models.

doi: 10.21437/Interspeech.2014-204

Cite as: Latorre, J., Yanagisawa, K., Wan, V., Kolluru, B., Gales, M.J.F. (2014) Speech intonation for TTS: study on evaluation methodology. Proc. Interspeech 2014, 2957-2961, doi: 10.21437/Interspeech.2014-204

  author={Javier Latorre and Kayoko Yanagisawa and Vincent Wan and BalaKrishna Kolluru and Mark J. F. Gales},
  title={{Speech intonation for TTS: study on evaluation methodology}},
  booktitle={Proc. Interspeech 2014},