ISCA Archive SpeechProsody 2006
ISCA Archive SpeechProsody 2006

Lombard speech: auditory (a), visual (v) and AV effects

Chris Davis, Jeesun Kim, Katja Grauwinkel, Hansjörg Mixdorff

This study examined Auditory (A) and Visual (V) speech (speech-related head and face movement) as a function of noise environment. Measures of AV speech were recorded for 3 males and 1 female for 10 sentences spoken in quiet as well as four styles of background noise (Lombard speech). Auditory speech was analyzed in terms of overall intensity, duration, spectral tilt and prosodic parameters employing Fujisaki model based parameterizations of F0 contours. Visual speech was analyzed in terms of Principal Components (PC) of head and face movement. Compared to speech in quiet, Lombard speech was louder, of longer duration, had more energy at higher frequencies (particularly with babble speech) and had greater amplitude mean accent and phrase commands. Visual Lombard speech showed greater influence of the PCs associated with jaw and mouth movement, face expansion and contraction and head rotation (pitch). Lombard speech showed increased AV speech correlations between RMS speech intensity and the PCs that involved jaw and mouth movement. A similar increased correlation occurred for intensity and head rotation (pitch). For Lombard speech, all talkers showed an increased correlation between F0 and head translation (raising and lowering). Increased F0 correlations for other head movements were more idiosyncratic. These findings suggest that the relationships underlying Audio-Visual speech perception differ depending on how that speech was produced.