ISCA Archive AVSP 2013
ISCA Archive AVSP 2013

Phonetic information in audiovisual speech is more important for adults than for infants; preliminary findings.

Martijn Baart, Jean Vroomen, Kathleen E. Shaw, Heather Bortfeld

Infants and adults are able to match auditory and visual speech but the cues on which they rely may differ. Here we provide an initial assessment of the relative contribution of temporal- and phonetic cues available in the AV signal. Adults (N=52) and infants (N=18) matched 2 trisyllabic speech sounds, either natural speech or SWS, with visual speech information. Adults saw two articulating faces and matched a sound to one of these, while infants were presented with the same stimuli in a preferential looking paradigm. Adults’ performance was almost flawless with natural speech, but was significantly less accurate with SWS. In contrast, infants matched the sound to the articulating face, irrespective of whether it was natural speech or SWS. We propose that infants matched the AV signal based on temporal cues whereas adults relied more heavily on phonetic cues. This is in line with the idea that lipreading improves with age.

Index Terms: Phonetic correspondence, temporal correspondence, audiovisual speech, sine-wave speech