This study uses an over-the-phone transcription task to address two issues in speech synthesis performance. One is the relationship between on the one hand results of a controlled experiment with compliant volunteers, and on the other hand performance of technology in field conditions with real users on a similar task. The second issue is the impact of domain-specific customisation of prosody for simple non-ambiguous texts on synthesis quality. The task is transcription of names and addresses, spoken by high-quality commercial speech synthesisers. Results show that (i) the largest differences between synthesisers will generalise from laboratory to field conditions, but relative rankings vary slightly on different scales, and (ii) prosodic customisation makes significant and consistent improvements on transcription accuracy, requests for repetitions, subjective ratings, and total time required to complete the task.
Keywords: Synthesis, Prosody, Evaluation, Intelligibility