ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Comparing decoding strategies for subword-based keyword spotting in low-resourced languages

William Hartmann, Viet-Bac Le, Abdel Messaoudi, Lori Lamel, Jean-Luc Gauvain

For languages with limited training resources, out-of-vocabulary (OOV) words are a significant problem, both for transcription and keyword spotting. This paper investigates the use of subword lexical units for keyword spotting. Three strategies for using the sub-word units are explored: 1) converting word-based lattices to subword lattices after decoding, 2) performing a separate decoding for each subword type, and 3) a single decoding using all possible subword units. In these experiments, the best performance is achieved by carrying out a separate decoding for each subword type. Further gains are attained through system combination. We also find that ignoring word boundaries improves the detection of OOV keywords without significantly impacting in-vocabulary keyword detection. Results are presented on four languages from the IARPA Babel Program (Haitian Creole, Assamese, Bengali, and Zulu).


doi: 10.21437/Interspeech.2014-528

Cite as: Hartmann, W., Le, V.-B., Messaoudi, A., Lamel, L., Gauvain, J.-L. (2014) Comparing decoding strategies for subword-based keyword spotting in low-resourced languages. Proc. Interspeech 2014, 2764-2768, doi: 10.21437/Interspeech.2014-528

@inproceedings{hartmann14_interspeech,
  author={William Hartmann and Viet-Bac Le and Abdel Messaoudi and Lori Lamel and Jean-Luc Gauvain},
  title={{Comparing decoding strategies for subword-based keyword spotting in low-resourced languages}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2764--2768},
  doi={10.21437/Interspeech.2014-528},
  issn={2308-457X}
}