ISCA Archive SSPR 2003
ISCA Archive SSPR 2003

Filler and disfluency identification based on morphological analysis and chunking

Masayuki Asahara, Yuji Matsumoto

We propose a novel filler/disfluency identification method for transcription of spontaneous speech in Japanese. Our method is hased on Japanese morphological analysis and chunking. Firstly, input sentences are analyzed with redundant outputs by a statistical morphological analyzer. Since fillers and disfluencies produce ambiguity in morphological analysis, we do this so as to take into account several possible roles for each character in the input. Secondly, a support vector machine-based chunker detects some ambiguous points as fillers or disfluencies. Although it cannot detect disfluency of function words satisfactorily, it achieves high performance for fillers and disfluencies of content words.