ISCA Archive IWSLT 2006
ISCA Archive IWSLT 2006

Automatic sentence segmentation and punctuation prediction for spoken language translation

Evgeny Matusov, Arne Mauser, Hermann Ney

This paper studies the impact of automatic sentence segmentation and punctuation prediction on the quality of machine translation of automatically recognized speech. We present a novel sentence segmentation method which is specifically tailored to the requirements of machine translation algorithms and is competitive with state-of-the-art approaches for detecting sentence-like units. We also describe and compare three strategies for predicting punctuation in a machine translation framework, including the simple and effective implicit punctuation generation by a statistical phrase-based machine translation system. Our experiments show the robust performance of the proposed sentence segmentation and punctuation prediction approaches on the IWSLT Chinese-to-English and TC-STAR English-to-Spanish speech translation tasks in terms of translation quality.