ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Cross-Cultural Comparison of Gradient Emotion Perception: Human vs. Alexa TTS Voices

Iona Gessinger, Michelle Cohn, Georgia Zellou, Bernd Möbius

This study compares how American (US) and German (DE) listeners perceive emotional expressiveness from Amazon Alexa text-to-speech (TTS) and human voices. Participants heard identical stimuli, manipulated from an emotionally ‘neutral' production to three levels of increased happiness generated by resynthesis. Results show that, for both groups, ‘happiness' manipulations lead to higher ratings of emotional valence (i.e., more positive) for the human voice. Moreover, there was a difference across the groups in their perception of arousal (i.e., excitement): US listeners show higher ratings for human voices with manipulations, while DE listeners perceive the Alexa voice as sounding less ‘excited' overall. We discuss these findings in terms of theories of cross-cultural emotion perception and human-computer interaction.