The extraction and selection of acoustic features are crucial steps in the development of a system for classifying emotions in speech. Most works in the field use some kind of prosodic features, often in combination with spectral and glottal features, and select appropriate features in classifying emotions. In the methods, feature choices are mostly made regardless of existing relationships and structures between features. However, considering them can be beneficial, potentially both for interpretability and to improve classification performance. To this end, a structured sparse logistic regression model incorporated with the hierarchical structure of features derived from prosody, spectral envelope, and glottal information is proposed in this paper. The proposed model simultaneously addresses tree-structured sparse feature selection and emotion classification. Evaluation of the proposed model on Berlin emotional database showed substantial improvement over the conventional sparse logistic regression model.