ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

An automatic sentence boundary detector based on a structured language model

Shinsuke Mori

In this paper we describe an automatic sentence boundary detector, which inserts a period (sentence boundary marker) to a word sequence output by a speech recognizer. The state-of-the-art automatic sentence boundary detectors insert a period at a position selected by a word tri-gram model from among candidates (long pauses) offered by an acoustic model. In contrast, the automatic sentence boundary detector presented in this paper is based on a structured language model (SLM), which regards a sentence as a word sequence with a syntactic structure. In the experiment we applied our automatic sentence boundary detector to Japanese broadcast lectures and compared the result with an automatic sentence boundary detector based on a word tri-gram model. The accuracy of our detector was 95.7%, which was higher than that for the state-of-the-art detector (95.2%). This result shows that an SLM works better than a word tri-gram model as an automatic sentence boundary detector.