ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Toward automatic transcription of Japanese broadcast news

Tatsuo Matsuok, Yuichi Taguchi, Katsutoshi Ohtsuki, Sadaoki Furui, Katsuhiko Shirai

In this paper, we report on the automatic recognition of Japanese broadcast-news speech. We have been working on large-vocabulary continuous speech recognition (LVCSR) for Japanese newspaper speech transcription and have achieved good performance. We have recently applied our LVCSR system to transcribing Japanese broadcast-news speech. We extended the vocabulary from 7k words to 20k words and trained the language models using newspaper texts and broadcast-news manuscripts. These two language models were applied to our evaluation speech sets. The language model trained using broadcast-news manuscripts achieved better results for broadcast-news speech than the language model trained using newspaper texts, which achieved better results for newspaper speech. We achieved a word error rate of 19.7% for anchor-speaker's speech by using a bigram language model and a trigram language model both trained using broadcast-news manuscripts.