ISCA Archive AVSP 2013
ISCA Archive AVSP 2013

Detecting auditory-visual speech synchrony: how precise?

Chris Davis, Jeesun Kim

Previous research suggests that people are rather poor at perceiving auditory-visual (AV) speech asynchrony, especially when the visual signal occurs first. However, estimates of AV synchrony detection depend on many factors and previous measures may have underestimated its precision. Here we used a synchrony-driven search task to examine how accurately an observer could detect AV speech synchrony. In this task on each trial a participant viewed four videos (positioned at the cardinal points of a circle) that showed the lower face of a talker while hearing a spoken /ba/ syllable. One video had the original AV timing, in the others the visual speech was shifted 100 ms, 200 ms or 300 ms earlier. Participants were required to conduct a speeded visual search for the synchronized face/voice token (the position of which was randomized). The results showed that the synchrony detection window was narrow with 82% of responses selecting either the original unaltered video (29%) or the video where the visual signal led by 100 ms (53%). These results suggest that an observer is able to judge AV speech synchrony with some precision.

Index Terms: Auditory-visual speech synchrony; Synchrony search task; Inter-sensory timing