ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

A robust unsupervised arousal rating framework using prosody with cross-corpora evaluation

Daniel Bone, Chi-Chun Lee, Shrikanth S. Narayanan

This paper presents an unsupervised method for producing a bounded rating of affective arousal from speech. One of the major challenges in such behavioral signal classification is the design of methods that generalize well across domains and datasets. We propose a framework that provides robustness across databases by: selecting coherent features based on empirical and theoretical evidence, fusing activation confidences from multiple features, and effectively weighting the soft-labels without knowing the true labels. Spearman's rank-correlation (and binary classification accuracy) on four arousal databases are: 0.62 (73%), 0.77 (86%), 0.70 (82%), and 0.65 (73%).

Index Terms: arousal rating, activation, unsupervised, knowledge-based, inter-rater reliability, cross-corpora