ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Streaming Intended Query Detection using E2E Modeling for Continued Conversation

Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara Sainath, Bo Li, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman

In voice-enabled applications, a predetermined hotword is usually used to activate a device in order to attend to the query. However, speaking queries followed by a hotword each time introduces a cognitive burden in continued conversations. To avoid repeating a hotword, we propose a streaming end-to-end (E2E) intended query detector that identifies the utterances directed towards the device and filters out other utterances not directed towards device. The proposed approach incorporates the intended query detector into the E2E model that already folds different components of the speech recognition pipeline into one neural network. The E2E modeling on speech decoding and intended query detection also allows us to declare a quick intended query detection based on early partial recognition result, which is important to decrease latency and make the system responsive. We demonstrate that the proposed E2E approach yields a 22% relative improvement on equal error rate (EER) for the detection accuracy and 600 ms latency improvement compared with an independent intended query detector. In our experiment, the proposed model detects whether the user is talking to the device with a 8.7% EER within 1.4 seconds of median latency after user starts speaking.