This paper argues that prosodic annotation and modeling should be combined for facilitating analyses of prosodic functions that invariably require perceptual judgments. It compares perceptual prosodic annotations of prominent syllables and phrase boundaries with labels yielded by the combination of linguistic information from a TTS-front end, model-based prosodic features, as well as a model of perceived syllabic prominence from an earlier study. As can be expected this annotation of prosodic landmarks yields better results on reading style speech than on spontaneous speech data. Of the perceptual annotations, on average 89% of perceptually prominent syllables were identified correctly, as well as a similar percentage of prosodic boundaries. Hence a basic annotation of prosodic features is yielded which can later on be enhanced by additional information for which perceptual judgments are indispensable.
Index Terms: Prosodic annotation, prosodic modeling, Fujisaki model, perceptual prominence