ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Automatic detection of sentence prominence in speech using predictability of word-level acoustic features

Sofoklis Kakouros, Okko Räsänen

Automatic detection of prominence in speech is an important task for many spoken language applications. However, most previous approaches rely on the availability of a corpus that is annotated with prosodic labels in order to train classifiers, therefore lacking generality beyond high-resourced languages. In this paper, we propose an algorithm for the automatic detection of sentence prominence that does not require explicit prominence labels for training. The method is based on the finding that human perception of prominence correlates with the (un)predictability of prosodic trajectories. The proposed system takes speech as input and combines information from automatically detected syllabic nuclei and three prosodic features in order to provide estimates of the prominent words. Results are reported using a speech corpus with manually assigned prominence labels from twenty annotators, showing that the algorithmic output converges with the annotators' prominence responses with 86% accuracy.