One of the main challenges in emotion recognition from speech is to discriminate emotions in the valence domain (positive versus negative). While acoustic features provide good characterization in the activation/arousal dimension (excited versus clam), they usually fail to discriminate between sentences with different valence (e.g., happy versus anger). This paper focuses exclusive on this dimension, which is key in many behavioral problems (e.g., depression). First, a regression analysis is conducted to identify the most informative features. Separate support vector regression (SVR) models are trained with various feature groups. The results reveal that spectral and F0 features produce the most accurate predictions of valence. Then, sentences with similar activation, but with different valence are carefully studied. The discriminative power in valence domain of individual features is studied with logistic regression analysis. This controlled experiment reveals differences between positive and negative emotions in the F0 distribution (e.g., positive skewness). The study also uncovers characteristic trends in the spectral domain.
Index Terms: valence, emotion recognition, speech analysis, emotion representation