Insertion of proper segmentation and punctuation into an ASR transcript
is crucial not only for the performance of subsequent applications
but also for the readability of the text. In a simultaneous spoken
language translation system, the segmentation model has to fulfill
real-time constraints and minimize latency as well.
In this paper, we
show the successful integration of an attentional encoder-decoder-based
segmentation and punctuation insertion model into a real-time spoken
language translation system. The proposed technique can be easily integrated
into the real-time framework and improve the punctuation performance
on reference transcripts as well as on ASR outputs. Compared to the
conventional language model and prosody-based model, our experiments
on end-to-end spoken language translation show that translation performance
is improved by 1.3 BLEU points by adopting the NMT-based punctuation
model, maintaining low-latency.