ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

String and lattice based discriminative training for the corpus of spontaneous Japanese lecture transcription task

Erik McDermott, Atsushi Nakamura

This article aims to provide a comprehensive set of acoustic model discriminative training results for the Corpus of Spontaneous Japanese (CSJ) lecture speech transcription task. Discriminative training was carried out for this task using a 100,000 word trigram for several acoustic model topologies, using both diagonal and full covariance models, and using both string-based and lattice-based training paradigms. We describe our implementation of the proposal by Macherey et al. for numerical subtraction of the reference lattice statistics from the competitor lattice statistics during lattice-based Minimum Classification Error (MCE) training. We also present results for lattice-based training that does not use such subtraction, corresponding to the well-known Maximum Mutual Information (MMI) approach. Discriminative training yielded relative reductions in Word Error Rate of up to 13%. Specific problems encountered in implementing discriminative training for this task are discussed.