ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Deriving document structure from prosodic cues

Martin Haase, Werner Kriechbaum, Gregor Möhler, Gerhard Stenzel

This study presents an approach for prosody-driven segmentation of speech data. The model is based solely on F0 contours and RMS envelopes. Phoneme or word information from a speech recognizer is unneccesary. Using data from German broadcast news, we show how this prosodic information can be exploited to retrieve structural information of the spoken text. The suitability of the CART-like algorithm for utterance boundary prediction has been evaluated on 7 five-minutes-news- reports, using 28 reports as training material for the classification tree. Sentence boundaries were predicted with a precision of 93%, at a recall of 88%.