ISCA Archive ICSLP 1992
ISCA Archive ICSLP 1992

Is % overall error rate a valid measure of speech synthesiser and natural speech performance at the segmental level?

Mikael Goldstein, Ove Till

Intelligibility of 18 Swedish consonants embedded in 3 symmetrical vowel contexts (a_a, i_i and o_o) was assessed using the VCV test procedure SOAP v3.0 (ESPRIT PROJECT 2589 (SAM)) implemented on a PC computer. 24 subjects with normal hearing assessed the 54 different VCV combinations. Two synthesizer systems, the KTH and the INFOVOX system were assessed. Natural speech was used as a baseline condition. Although the % overall error rate for Natural speech was very low (5.56%) as compared to the KTH (8.72%) and INFOVOX (12.27%) synthesis systems, it was located to three VCV words that obtained very high error rates: otjo (54%), atja (50%) and oho (34%). For the VCV word oho, Natural speech yielded a significantly higher % error rate than that obtained for each of the two synthesis systems.

Differences between two correlated proportions (% correct) obtained for each VCV combination and system assessed by the same group of subjects were tested using the new program module DPROP.EXE, which uses the result files generated by the SOAP software as input files. The program tests for differences between all VCV combinations generated by two different systems that have been assessed by the same group of subjects. The testing (at the 2%-level, two-tailed t-test) of VCV words between KTH-Natural speech yielded 4 significant (*) differences (+ongo*, +ingi*, +ini*, and -oho*), between INFOVOX-Natural speech 9 significant differences (+ovo*, +ingi*, +ongo*, +omo*, +ibi*, +opo*, +olo*, +ama* and -oho*) and between INFOVOX-KTH synthesis 3 significant differences (+ovo*, +opo* and +ibi*). In order to be significant, the % error difference had to be of the order of 30%. The use of the % overall error rate as a valid diagnostic synthesiser performance measure is discussed. A measure that treats all insignificant VCV differences the same way as significant differences, by adding them together into an % overall error rate. This measure is compared to significance testing of individual VCV words, as well as the use of Natural speech as a 'true' baseline.