In this paper, we develop a sentence boundary detection system which incorporates a prosodic model, word and preterminal-level language models, and a global sentence-length model. An important aspect of this research was the investigation of crowdsourced punctuation annotations as a source of multiple references for evaluation purposes. In order to evaluate the system we propose a BLUE-like metric which compares a hypothesis to multiple references. Experiments on both transcription and ASR output show that the global sentence length model can improve the performance by 7.2% on reference transcripts and 3.8% on ASR output.
Index Terms: sentence boundary detection, prosody, finite-state transducer, amazon mechanical turk