ISCA Archive Interspeech 2018
ISCA Archive Interspeech 2018

Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions

Bekir Berker Türker, Engin Erzin, Yücel Yemez, Metin Sezgin

Head-nods and turn-taking both significantly contribute conversational dynamics in dyadic interactions. Timely prediction and use of these events is quite valuable for dialog management systems in human-robot interaction. In this study, we present an audio-visual prediction framework for the head-nod and turn-taking events that can also be utilized in real-time systems. Prediction systems based on Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are trained on human-human conversational data. Unimodal and multimodal classification performances of head-nod and turn-taking events are reported over the IEMOCAP dataset.