ISCA Archive AVSP 2013
ISCA Archive AVSP 2013

Differences in the audio-visual detection of word prominence from Japanese and English speakers

Martin Heckmann, Keisuke Nakamura, Kazuhiro Nakadai

We have previously shown that for English speakers information on the mouth shape of a speaker is a powerful feature for the machine based discrimination of prominent from nonprominent words. In this paper we extend our analysis to data from Japanese speakers. We compare the discrimination performance of the different acoustic and visual features we extract for the two languages. This comparison shows a much wider variability in discrimination scores for the different speakers and the different features in the English dataset than in the Japanese dataset. Despite previous hints that visual speech and word prominence perception by Japanese listeners can yield inferior performance compared to English listeners we see that our discrimination scores are high and very similar for the English and Japanese speakers which indicates that at least the speakers signal prominence with a similar level of consistency in both languages.

Index Terms: prosody, prominence, visual, audio-visual, Japanese