ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Speaker diarization using eye-gaze information in multi-party conversations

Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Tatsuya Kawahara

We present a novel speaker diarization method by using eye-gaze information in multi-party conversations. In real environments, speaker diarization or speech activity detection of each participant of the conversation is challenging because of distant talking and ambient noise. In contrast, eye-gaze information is robust against acoustic degradation, and it is presumed that eye-gaze behavior plays an important role in turn-taking and thus in predicting utterances. The proposed method stochastically integrates eye-gaze information with acoustic information for speaker diarization. Specifically, three models are investigated for multi-modal integration in this paper. Experimental evaluations in real poster sessions demonstrate that the proposed method improves accuracy of speaker diarization from the baseline acoustic method.


doi: 10.21437/Interspeech.2014-137

Cite as: Inoue, K., Wakabayashi, Y., Yoshimoto, H., Kawahara, T. (2014) Speaker diarization using eye-gaze information in multi-party conversations. Proc. Interspeech 2014, 562-566, doi: 10.21437/Interspeech.2014-137

@inproceedings{inoue14_interspeech,
  author={Koji Inoue and Yukoh Wakabayashi and Hiromasa Yoshimoto and Tatsuya Kawahara},
  title={{Speaker diarization using eye-gaze information in multi-party conversations}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={562--566},
  doi={10.21437/Interspeech.2014-137},
  issn={2308-457X}
}