ISCA Archive SSPR 2003
ISCA Archive SSPR 2003

Corpus of spontaneous Japanese: its design and evaluation

Kikuo Maekawa

Corpus of Spontaneous Japanese, or CSJ, is a large-scale database of spontaneous Japanese. It contains speech signal and transcription of about 7 million words along with various annotations like POS and phonetic labels. After describing its design issues, preliminary evaluation of the CSJ was presented. The results suggest strongly the usefulness of the CSJ as the resource for the study of spontaneous speech.