This paper reports an investigation of features relevant for classifying two speaking styles, namely, conversational speaking style and clear (e.g. hyper-articulated) speaking style. Spectral and prosodic features were automatically extracted from speech and classified using decision tree classifiers and multilayer perceptrons to achieve accuracies of about 71% and 77% respectively. More interestingly, we found that out of the 56 features only about 9 features are needed to capture the most predictive power. While perceptual studies have shown that spectral cues are more useful than prosodic features for intelligibility [1], here we find prosodic features are more important for classification.
A. Kain, A. Amano-Kusumoto, and J.-P. Hosom, Hybridizaing conversational and clear speech to determine the degree of contribution of acoustic features to intelligibility, Journal of the Acoustical Society of America, vol. 124, no. 4, pp. 23082319, 2008