Realism and naturalness in a conversational multi-modal interface

G. Power, Robert I. Damper, W. Hall, G. B. Wills

As computing becomes ever more pervasive in everyday life, new interface metaphors are urgently required for mobile and multi-modal applications. In this paper, we consider the issues of realism and naturalness in virtual ‘talking head’ characters. Specifically, we address the two questions: (1) What is the most appropriate degree of visual realism for a talking head, and does this vary with the degree of interaction? (2) To what extent should the naturalness of the synthetic speech match the realism of the talking head? Experiments are described that provide partial answers, by asking subjects to rate the interfaces on five attributes, as well as providing informal comments. Indications are that users prefer an intermediate level of visual realism, perhaps because this matches the underlying technology (animation, speech synthesis) best. Question (2) is very difficult to answer because of the difficulty of controlling naturalness in a synthesiser. Using three different TTS engines, we found that ratings across attributes varied with the synthesiser although average overall scores were very similar. Interestingly, subjects were not always aware when different synthesisers were being employed.

