In this work, we investigate the feasibility of applying our prior works on discriminative training (DT) using non-uniform criteria to a keyword spotting task on spontaneous conversational speech. One of DT methods, minimum classification error (MCE), is recast and efficiently implemented in the weighted finite state transducer (WFST) framework to fit a keyword spotting task. To validate our approach, we evaluate it on a conversational speech task, the credit card use subset of Switchboard, in both kinds of keyword spotting scenarios: one is when a large vocabulary continuous speech recognition (LVCSR) decoder is available, the other is when a simple word-loop grammar of limited vocabulary is used. The results show our approach performs well in both cases, achieving 2.77% and 3.15% figure of merits (FOMs) absolute improvements
Index Terms: LVCSR, keyword spotting, DT, non-uniform criteria, WFST