In real world applications involving synthetic speech, listeners should be able to comprehend the speech as easily and quickly as possible. Not only the intelligibility of phonemes contributes to the comprehensibility of speech, but suprasegmental factors (e.g. intonation, duration, accents, pauses) as well. Whereas methods for assessing segmental intelligibility are well known, it is less clear how overall quality can be measured. One method that might be used is the comprehension test. In a comprehension test, subjects hear a few sentences or paragraphs and answer question(s) about the content. However, the percentage of questions answered correctly may over-estimate the real-life comprehensibility of a synthesizer, because listeners may compensate for the less-than-perfect intelligibility by spending extra cognitive effort, that may go at the cost of other tasks that must be performed at the same time, like driving or monitoring.
NYNEX developed a comprehension test containing items that approach real world applications. The test items were carefully constructed to ensure that they could not be answered without hearing the speech. In this paper this test is evaluated. To that end 322 subjects, recruted via an advertisement in a local newspaper, participated in the comprehension test, listening to synt hetic or natural speech. Half of the subjects did a concurrent mouse-controlled tracking task as well.
Despite the clearly superior quality of the natural speech, the comprehension test failed to show a statistically significant difference in the comprehensibility of natural and synthetic speech. Moreover, the tracking task did not seem to fulfill its intended function either. Detailed analysis of the data and the results led to a proposal for a redesign of the procedure for measuring the comprehensibility of synthetic speech.