ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Combination of FST and CN search in spoken term detection

Justin Chiu, Yun Wang, Jan Trmal, Daniel Povey, Guoguo Chen, Alexander I. Rudnicky

Spoken Term Detection (STD) focuses on finding instances of a particular spoken word or phrase in an audio corpus. Most STD systems have a two-step pipeline, ASR followed by search. Two approaches to search are common, Confusion Network (CN) based search and Finite State Transducer (FST) based search. In this paper, we examine combination of these two different search approaches, using the same ASR output. We find that the CN search performs better on shorter queries, and FST search performs better on longer queries. By combining the different search results from the same ASR decoding, we achieve better performance compared to either search approach on its own. We also find that this improvement is additive to the usual combination of decoder results using different modeling techniques.

doi: 10.21437/Interspeech.2014-532

Cite as: Chiu, J., Wang, Y., Trmal, J., Povey, D., Chen, G., Rudnicky, A.I. (2014) Combination of FST and CN search in spoken term detection. Proc. Interspeech 2014, 2784-2788, doi: 10.21437/Interspeech.2014-532

  author={Justin Chiu and Yun Wang and Jan Trmal and Daniel Povey and Guoguo Chen and Alexander I. Rudnicky},
  title={{Combination of FST and CN search in spoken term detection}},
  booktitle={Proc. Interspeech 2014},