ISCA Archive IDS 2002
ISCA Archive IDS 2002

Toward adaptive conversational interfaces: Modeling speech convergence with animatedpersonas

Sharon Oviatt, Courtney Stevens, Rachel Coulston, Benfang Xiao, Matt Wesson, Cynthia Girand, Evan Mellander

During interpersonal conversation, both children and adults adapt the basic acoustic-prosodic features of their speech to converge with those of their conversational partner. However, comparable adaptivity in users’ speech signal has not been explored previously during human-computer interaction. In this study, 7-to-10-year-old children interacted with a multimodal conversational interface in which animated characters used text-to-speech output (TTS) to answer questions about marine biology. Analysis of children’s speech input to the animated characters revealed that it adapted to more closely match the TTS output they heard. When speaking with an extroverted animated character whose speech was faster paced and louder, children significantly increased their utterance amplitude and decreased the duration of their dialogue response latencies between conversational turns. In contrast, when speaking with an introverted partner, they decreased their amplitude and increased response latencies. These adaptations were dynamic, bi-directional, and generalized across different user groups and TTS voices. Implications are discussed for guiding children’s spoken language to be better synchronized and more easily processed by a conversational system, and for the future development of robust and adaptive conversational interfaces.


Cite as: Oviatt, S., Stevens, C., Coulston, R., Xiao, B., Wesson, M., Girand, C., Mellander, E. (2002) Toward adaptive conversational interfaces: Modeling speech convergence with animatedpersonas. Proc. Multi-Modal Dialogue in Mobile Environments, paper 27

@inproceedings{oviatt02_ids,
  author={Sharon Oviatt and Courtney Stevens and Rachel Coulston and Benfang Xiao and Matt Wesson and Cynthia Girand and Evan Mellander},
  title={{Toward adaptive conversational interfaces: Modeling speech convergence with animatedpersonas}},
  year=2002,
  booktitle={Proc. Multi-Modal Dialogue in Mobile Environments},
  pages={paper 27}
}