Feature extraction and dimensionality reduction may be found as the most imperative parts of the emotional speech recognition problem. In this work, we propose a new set of speech features, based on the distribution of energy in frequency domain. To investigate the applicability of the proposed model, we have set the first international audio/visual emotion challenge (AVEC 2011) as the benchmark. As for the modeling and dimensionality reduction, we have employed the lasso. It is shown how 15 explicit spectral energy features, as suggested in this work, can lead to a more accurate model than those of all the participants in the audio sub-challenge. This is while this number of features is less than ten percent of the smallest set of features participated in the challenge. Centre for Patter Analysis and Machine Intelligence,
Index Terms: emotional speech recognition, feature extraction, dimensionality reduction