ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Performance comparison of training algorithms for semi-supervised discriminative language modeling

Erinç Dikici, Arda Çelebi, Murat Saraçlar

Discriminative language modeling (DLM) has been shown to improve the accuracy of automatic speech recognition (ASR) systems, but it requires large amounts of both acoustic and text data for training. One way to overcome this is to use simulated hypotheses instead of real hypotheses for training, which is called semi-supervised training. In this study, we compare six different perceptron algorithms with the semisupervised training approach. We formulate the DLM both as a structured prediction and a reranking problem, optimizing different criteria in each. We find that ranking variants achieve similar or better word error rate (WER) reduction with respect to structured perceptrons when used with real, simulated, or a combination of such data.

Index Terms: discriminative training, semi-supervised learning, language modeling, hypothesis simulation, ranking perceptron