ISCA Archive SSPR 2003
ISCA Archive SSPR 2003

Mental state detection of dialogue system users via spoken language

Tong Zhang, Mark Hasegawa-Johnson, Stephen E. Levinson

This paper presents an approach to simulate the mental activities of children during their interaction with computers through their spoken language. The mental activities are categorized into three states: confidence, confusion and frustration. Two knowledge sources are used in the detection. One is prosody, which indicates utterance type and userís attitude. The other is embedded key words/phrases which help interpret the utterances. Moreover, it is found that childrenís speech exhibits very different acoustic characteristics from adults. Given the uniqueness of childrenís speech, this paper applies a vocal-tract-length-normalization (VTLN)-based technique to compensate for both inter-speaker variability and intraspeaker variability in childrenís speech. The detected key words/phrases are then integrated with prosodic information as the cues for the MAP decision of mental states. Tests on a set of 50 utterances collected from the project experiment showed the classification accuracy was 74%.