ISCA Archive LW 2009
ISCA Archive LW 2009

Technology for Processing Non-verbal Information in Speech

Nick Campbell

Current speech technology is founded upon text. People don't speak text, so there is often a mismatch between the expectations of the system and the performance of its users. Talk in social interaction of course involves the exchange of propositional content (which can be expressed through text) but it also involves social networking and the expression of interpersonal relationships, as well as displays of emotion, affect, interest, etc. A computer-based system that processes human speech, whether an information-providing service, a translation device, part of a robot, or entertainment system, must not only be able to process the text of that speech, but must also be able to interpret the underlying intentions, or acts, of the speaker who produced it. It is not enough for a machine just to know what a person is saying; it must also know what that person is doing with each utterance as part of an interactive discourse.