ISCA Archive SUS 1995
ISCA Archive SUS 1995

The strains of emotional stress in synthetic speech

Iain R. Murray, John L. Arnott, Elissaveta A. Rohwer

Text-to-speech and other speech output technologies are becoming increasingly widespread, especially as input technologies improve and facilitate applications with both speech input and output. Although speech output systems now generally have very high intelligibility, most are still easily identified as artificial voices and no commercial systems yet allow prosodic variation due to emotion and related factors. This is largely due to the complexity of incorporating such naturalness factors, and our very limited knowledge of what voice changes actually occur due to the speaker's emotion. However, prosodic content in synthetic speech is seen as increasingly important as interactive computer systems become more common, and there is presently renewed interest in the investigation of human vocal emotion and the expansion of synthesis models to allow greater prosodic variation.

This paper will review progress to date in the investigation of human vocal emotions and their simulation in synthetic speech, and requirements for future research which is required to develop this area will be presented.