The comprehension of natural and synthetic speech in Swedish and American English was investigated using a sentence-by-sentence listening paradigm. The synthesised speech was generated by the KTH text-to-speech systems. Results indicated that sentence listening times were significantly longer only for American English synthetic speech than natural speech. Text difficulty was found to be a significant variable in both Swedish and American English for sentence listening times and word recognition, and only in American English for proposition recognition. The results are discussed in terms of the quality of the synthesisers and factors involved in comprehension.
Keywords: comprehension, synthetic speech, text-to-speech