ISCA Archive SpeechProsody 2024
ISCA Archive SpeechProsody 2024

Voice transforms for affect control in Irish speech synthesis

Anna Maria Giovannini, Zihan Wang, Maria O'Reilly, Ailbhe Ní Chasaide, Christer Gobl

This paper reports on an experiment using voice transforms to alter the perceived affect in synthetic utterances of Irish, with a view to controlling affect in the spoken output of an Irish AAC device. The transforms were guided by prior experience and by voice source analyses of utterances by a male speaker with an angry, happy, sad, bored, relaxed and neutral voice. The neutral utterance was modified to incorporate stylised voice transforms targeting these affects. Modifications included global shifts affecting the entire utterance, local shifts affecting only accented syllables, and a combination of global and local changes. Stimuli targeting sad and happy included tempo changes and formant shifts were included for happy. Listeners’ evaluations most positively identified the high activation affects happy and angry. Stimuli targeting sad were also effective, while those targeting bored and relaxed were not, although bored was positively associated with some of the sad-targeting stimuli. Results for low activations states are confounded by the fact that the neutral stimulus was to some degree biased towards bored, sad and relaxed affects. Of the three types of transforms, global, local and combined, the most effective appears to vary with the targeted affect.