This paper deals with the modelling and perception of disfluencies in articulatory speech synthesis. The stimuli are embedded into short dialogues in question-answering situations in a human–machine scenario. The system is supposed to express uncertainty in the answer. We test the influence of delay, intonation, and filler as prosodic indicators of uncertainty on perception in two studies. Study 1 deals with the effect of delay and filler on uncertainty perception. Results suggest an additive effect of the cues, i.e. the activation of both prosodic cues of uncertainty has a stronger impact on uncertainty perception than the deactivation of a single cue or of both cues. With respect to the effect of single cues, no significant difference can be observed. Study 2 investigates the impact of delay and intonation on perceived uncertainty. Again, a principle of additivity can be observed. Furthermore as modelled here, intonation has a stronger influence than delay. In both studies no correlation between the ranking of uncertainty and naturalness of the stimuli is found.
Index Terms: uncertainty, disfluencies, speech synthesis, speech perception.