ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Experiments in spoken queries for document retrieval

J. Barnett, S. Anderson, J. Broglio, M. Singh, R. Hudson, S. W. Kuo

We report the results of three experiments using the errorful output of a large vocabulary continuous speech recognition (LVCSR) system as the input to a statistical information retrieval (IR) system. Our goal is to allow a user to speak, rather than type, query terms into an IR engine and still obtain relevant documents. The purpose of these experiments is to test whether IR systems are robust to errors in the query terms introduced by the speech recognizer. If the correctly recognized words in the search query outweigh the misinformation from the incorrectly recognized words, the relevant documents will still be retrieved. This paper presents evidence that speech-driven IR can be effective, although with a reduced precision. We also find that longer spoken queries produce higher precision retrieval than shorter queries. For queries containing many (50-60) search terms and a recognizer word error rate (WER) of 27.9%, the precision at 30 documents retrieved is degraded by only 11.1%. For roughly the same WER, however, we find that queries shorter than 10-15 words suffer more than a 30% loss of precision.

doi: 10.21437/Eurospeech.1997-371

Cite as: Barnett, J., Anderson, S., Broglio, J., Singh, M., Hudson, R., Kuo, S.W. (1997) Experiments in spoken queries for document retrieval. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 1323-1326, doi: 10.21437/Eurospeech.1997-371

  author={J. Barnett and S. Anderson and J. Broglio and M. Singh and R. Hudson and S. W. Kuo},
  title={{Experiments in spoken queries for document retrieval}},
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},