ISCA Archive AVSP 2003
ISCA Archive AVSP 2003

Using speech and gesture to explore user states in multimodal dialogue systems

Rui P. Shi, Johann Adelhardt, Viktor Zeißler, Anton Batliner, Carmen Frank, Elmar Nöth, Heinrich Niemann

Modern dialogue systems should interpret the users’ behavior and mind in the same way as human beings do. That means in a multimodal manner, where communication is not limited to verbal utterances, as is the case for most state-of-the-art dialogue systems, several modalities are involved, e.g., speech, gesture, and facial expression. The design of a dialogue system must adapt its concept to multimodal interaction and all these different modalities have to be combined in the dialogue system. This paper describes the recognition of a users internal state of mind using a prosody classifier based on artificial neural networks combined with a discrete Hidden Markov Model (HMM) for gesture analysis. Our experiments show that both input modalities can be used to identify the users internal state. We show that an improvement of up to 70% can be achieved when fusing both modalities.