ISCA Archive SSPR 2003
ISCA Archive SSPR 2003

Morphological analysis of the corpus of spontaneous Japanese

Kiyotaka Uchimoto, Chikashi Nobata, Atsushi Yamada, Satoshi Sekine, Hitoshi Isahara

This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and a method for accurately tagging a large spontaneous speech corpus. In this paper, we show that by using semi-automatic analysis we can expect a precision of over 99% for detecting and tagging short words and 97% for long words; the two types of words comprising the corpus.