ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Leveraging Prosody for Punctuation Prediction of Spontaneous Speech

Yeonjin Cho, Sara Ng, Trang Tran, Mari Ostendorf

This paper introduces a new neural model for punctuation prediction that incorporates prosodic features to improve automatic punctuation prediction in transcriptions of spontaneous speech. We explore the benefit of intonation and energy features over simply using pauses. In addition, the work poses the question of how to represent interruption points associated with disfluencies in spontaneous speech. In experiments on the Switchboard corpus, we find that prosodic information improved punctuation prediction fidelity for both hand transcripts and ASR output. Explicit modeling of interruption points can benefit prediction of standard punctuation, particularly if the convention associates interruptions with commas.