ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Recognising interest in conversational speech - comparing bag of frames and supra-segmental features

Björn Schuller, Gerhard Rigoll

It is common knowledge that affective and emotion-related states are acoustically well modelled on a supra-segmental level. Nonetheless successes are reported for frame-level processing either by means of dynamic classification or multi-instance learning techniques. In this work a quantitative feature-type-wise comparison between frame-level and supra-segmental analysis is carried out for the recognition of interest in human conversational speech. To shed light on the respective differences the same classifier, namely Support-Vector-Machines, is used in both cases: once by clustering a ‘bag of frames’ of unknown sequence length employing Multi- Instance Learning techniques, and once by statistical functional application for the projection of the time series onto a static feature vector. As database serves the Audiovisual Interest Corpus of naturalistic interest.