ISCA Archive Interspeech 2018
ISCA Archive Interspeech 2018

Modeling Self-Reported and Observed Affect from Speech

Jian Cheng, Jared Bernstein, Elizabeth Rosenfeld, Peter W. Foltz, Alex S. Cohen, Terje B. Holmlund, Brita Elvevåg

Listeners hear joy/sadness and engagement/indifference in speech, even when linguistic content is neutral. We measured audible emotion in spontaneous speech and related it to self-reports of affect in response to questions, such as “Are you hopeful?” Spontaneous speech and self-reports were both collected in sessions with an interactive mobile app and used to compare three affect measurements: self-report; listener judgement; and machine score. The app adapted a widely-used measure of affective state to collect self-reported positive/negative affect and it engaged users in spoken interactions. Each session elicited 11 affect self-reports and captured about 9 minutes of speech; with 118 sessions by psychiatric patients and 227 sessions by non-clinical users. Speech recordings were evaluated for arousal and valence by clinical experts and by computer analysis of acoustic (non-linguistic) variables. The affect self-reports were reasonably reliable (α 0.73 to 0.84). Combined affect ratings from clinical-expert listeners produced reliable ratings per session (α 0.75 to 0.99) and acoustic feature analysis matched the expert ratings fairly well (0.36 < r < 0.72, mean 0.57), but neither human nor computed scores had high correlation with standard affect self-reported values. These results are discussed in relation to common methods of developing and evaluating affect analysis.