ISCA Archive ICSLP 1992
ISCA Archive ICSLP 1992

Towards a robust speech interface for teleoperation systems

James H. Bradford

A number of recent papers on the design of speech interfaces have argued in favour of short command sequences. The argument usually proceeds as follows: suppose that a given speech recognizer has a word recognition rate of 95%. The probability of recognizing a 10 word utterance without error is about 60% (i.e. 0.9510). In a more realistic setting with background noise and speaker variability recognition rates often drop below 80%. With an 80% recognition rate, the probability of recognizing a 10 word utterance without error is only 11%! This, as well as the difficulties encountered with command confirmation and error correction dialogues tends to support a preference for short command sequences.

However, this paper presents a design methodology which favours processing commands in larger aggregates called "command paragraphs." The application of command paragraphs dramatically increases the amount of locally available contextual information. Typically a paragraph consists of a number of individual commands that when taken together, completely specify how a task is to be performed. Thus the grammar of the constituent commands and knowledge about how the commands work together to achieve a task can be used to correct recognition errors made on a word-by-word basis.

It has been shown that in the context of traditional keyboard interfaces, command paragraphs posses powerful error recovery properties [1, 2].

This paper describes a prototype which tests the paragraph approach with an interface using a speaker-dependent discrete-word recognizer. The prototype itself controls a simulated robotic arm which manipulates various objects in Earth orbit. The user can launch satellites, weld beams, etc., using only verbal commands. Feedback is provided by a computer driven animation. Initial experience with the prototype is very encouraging. In one demanding trial, a command paragraph specifying a weld on a free floating beam contained 10 separate command sentences for a total of 43 words. The test was conducted in a noisy environment and the operator was suffering a cold. As a result there were 17 recognition errors. A total of 16 errors were corrected by grammatical analysis and the remaining error by analysis of the task semantics. Thus the interface was able to deduce the correctly worded paragraph without recourse to correction dialogues. This is in contrast with the traditional approach which would have required 17 corrections on a word-by-word basis.

The use of longer rather than shorter command sequences as input to speech interfaces represents a significant departure from the usual design paradigm. This paper describes a first attempt to study the effectiveness of the paragraph approach.