ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Space-Efficient Representation of Entity-centric Query Language Models

Christophe Van Gysel, Mirko Hannemann, Ernest Pusateri, Youssef Oualil, Ilya Oparin

Virtual assistants make use of automatic speech recognition (ASR) to help users answer entity-centric queries. However, spoken entity recognition is a difficult problem, due to the large number of frequently-changing named entities. In addition, resources available for recognition are constrained when ASR is performed on-device. In this work, we investigate the use of probabilistic grammars as language models within the finite-state transducer (FST) framework. We introduce a deterministic approximation to probabilistic grammars that avoids the explicit expansion of non-terminals at model creation time, integrates directly with the FST framework, and is complementary to n-gram models. We obtain a 10% relative word error rate improvement on long tail entity queries compared to when a similarly-sized n-gram model is used without our method.

doi: 10.21437/Interspeech.2022-193

Cite as: Van Gysel, C., Hannemann, M., Pusateri, E., Oualil, Y., Oparin, I. (2022) Space-Efficient Representation of Entity-centric Query Language Models. Proc. Interspeech 2022, 679-683, doi: 10.21437/Interspeech.2022-193

  author={Christophe {Van Gysel} and Mirko Hannemann and Ernest Pusateri and Youssef Oualil and Ilya Oparin},
  title={{Space-Efficient Representation of Entity-centric Query Language Models}},
  booktitle={Proc. Interspeech 2022},