ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Temporal and situational context modeling for improved dominance recognition in meetings

Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll

We present and evaluate a novel approach towards automatically detecting a speaker's level of dominance in a meeting scenario. Since previous studies reveal that audio appears to be the most important modality for dominance recognition, we focus on the analysis of the speech signals recorded in multi-party meetings. Unlike recently published techniques which concentrate on frame-level hidden Markov modeling, we propose a recognition framework operating on segmental data and investigate context modeling on three different levels to explore possible performance gains. First, we apply a set of statistical functionals to capture large-scale feature-level context within a speech segment. Second, we consider bidirectional Long Short-Term Memory recurrent neural networks for long-range temporal context modeling between segments. Finally, we evaluate the benefit of situational context incorporation by simultaneously modeling speech of all meeting participants. Overall, our approach leads to a remarkable increase of recognition accuracy when compared to hidden Markov modeling.