ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Speaker diarization using gesture and speech

Binyam Gebrekidan Gebre, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts, Tom Heskes

We demonstrate how the problem of speaker diarization can be solved using both gesture and speaker parametric models. The novelty of our solution is that we approach the speaker diarization problem as a speaker recognition problem after learning speaker models from speech samples corresponding to gestures (the occurrence of gestures indicates the presence of speech and the location of gestures indicates the identity of the speaker). This new approach offers many advantages: comparable state-of-the-art performance, faster computation and more flexibility. In our implementation, parametric models are used to model speakers' voice and their gestures: more specifically, Gaussian mixture models are used to model the voice characteristics of each person and all persons, and gamma distributions are used to model gestural activity based on features extracted from Motion History Images. Tests on 4.24 hours of the AMI meeting data show that our solution makes DER score improvements of 19% on speech-only segments and 4% on all segments including silence (the comparison is with the AMI system).


doi: 10.21437/Interspeech.2014-141

Cite as: Gebre, B.G., Wittenburg, P., Drude, S., Huijbregts, M., Heskes, T. (2014) Speaker diarization using gesture and speech. Proc. Interspeech 2014, 582-586, doi: 10.21437/Interspeech.2014-141

@inproceedings{gebre14_interspeech,
  author={Binyam Gebrekidan Gebre and Peter Wittenburg and Sebastian Drude and Marijn Huijbregts and Tom Heskes},
  title={{Speaker diarization using gesture and speech}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={582--586},
  doi={10.21437/Interspeech.2014-141},
  issn={2308-457X}
}