ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

An in-depth comparison of keyword specific thresholding and sum-to-one score normalization

Yun Wang, Florian Metze

We compare several approaches, separately and together, for spotting of out-of-vocabulary (OOV) keywords, in terms of their ATWV scores. We considered three types of recognition units (whole words, syllables, and subwords of different lengths) and two basic search strategies (whole-unit, fuzzy phonetic search). In all cases, the search was performed by collapsing the recognition lattice into a consensus network, either in terms of the recognized whole units, or by first splitting the recognized units into phonemes. We ran experiments on five languages, for which the language model and vocabulary were derived from only 10 hours of transcriptions (70k-100k words of text), resulting in keyword OOV rates varying from 10% to 63% on new data, depending on the language. Our conclusions were that: 1) In all cases, the fuzzy phonetic search on phoneme-split lattices is better than searching for the whole units, 2) The syllable units are the best of the subword units for OOV keyword detection using fuzzy phonetic search, and 3) These methods combine very well, sometimes resulting in ATWV scores for OOV terms which are not too far below those of IV terms.