ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Data sampling and dimensionality reduction approaches for reranking ASR outputs using discriminative language models

Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın

This paper investigates various approaches to data sampling and dimensionality reduction for discriminative language models (DLM). Being a feature based language modeling approach, the aim of DLM is to rerank the ASR output with discriminatively trained feature parameters. Using a Turkish morphology based feature set, we examine the use of online Principal Component Analysis (PCA) as a dimensionality reduction method. We exploit ranking perceptron and ranking SVM as two alternative discriminative modeling techniques, and apply data sampling to improve their efficiency. We obtain a reduction in word error rate (WER) of 0.4%, significant at p < 0.001 over the baseline perceptron result.