ISCA Archive SSW 2023
ISCA Archive SSW 2023

Importance of Human Factors in Text-To-Speech Evaluations

Lev Finkelstein, Joshua Camp, Rob Clark

Both mean opinion score (MOS) evaluations and preferencetests in text-to-speech are often associated with high rating variance. In this paper we investigate two important factors thataffect that variance. One factor is that the variance is comingfrom how raters are picked for a specific test, and another is thedynamic behavior of individual raters across time.This paper increases the awareness of these issues when designing an evaluation experiment, since the standard confidenceinterval on the test level cannot incorporate the variance associated with these two factors. We show the impact of the twosources of variance and how they can be mitigated. We demonstrate that simple improvements in experiment design such asusing a smaller number of rating tasks per rater can significantlyimprove the experiment confidence intervals / reproducibilitywith no extra cost.