ISCA Archive SpeechProsody 2024
ISCA Archive SpeechProsody 2024

A Cross-linguistic Study on Audiovisual Perception of Prosodic Prominence by Chinese and English Observers

Ran Bi, Marc Swerts

Speakers and their conversation partners use both auditory and visual cues (facial expression) to highlight important information by making some words more prominent. Prior work on languages like Dutch and English has shown that intonation markers such as pitch accents, duration and loudness as well as facial expression such as head, eyebrow and mouth movements are often exploited to signal and interpret prosodic prominence. However, little is known about how observers of different linguistic backgrounds (Chinese and English) perceive prominence of these languages, and how audiovisual cues affect prominence perception of the two languages, especially when audiovisual cues are incongruent. Using naturally elicited stimuli from Chinese and English speakers, a perceptual experiment was conducted to measure both L1 and L2 observers' reaction time and accuracy in a task of judging which word was prominent in speech. The observers were exposed to both Chinese and English stimuli in three different formats: audio-only, audiovisually congruent, and audiovisually incongruent. Results revealed that (1) visual cues were important for prominence perception especially in incongruent stimuli; (2) observers of both languages identify prominence more easily and accurately in Chinese than in English; (3) there are consistent correlation between reaction time and accuracy.