During expressive speech, the voice is enriched to convey not only the intended semantic message but also the speaker's state of mind and intention. Our goal is to design a tool which can be used in Speech-to-Speech translation system for automatically classifying utterances of Bangla into three modalities namely Statement, Question and Command. Although pitch and intensity features have been commonly used to recognize sentence modality, it is not clear what aspects of the pitch and intensity contour are salient for recognizing sentence modality in Bangla. A set of 30 features derived from 680 speech samples are analyzed to identify the most discriminative set of features for Bangla. Three well-known classification algorithms viz. Decision Tree J48, Support Vector Machine and k-Nearest Neighbor (k-NN) are tested with both the full set and reduced subset of features. A global accuracy of 96.08 % of correct classification has been achieved by k-NN using the reduced subset of features.
Index Terms: sentence modality, recognition, prosodic features