ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Custom wake word detection

Kesavaraj V, Charan Devarkonda, Vamshiraghusimha Narasinga, Anil Kumar Vuppala

In personalizing interactions with smart devices, identifying keywords in an open-vocabulary context is crucial. Previous methods for open-vocabulary keyword spotting relied on a shared embedding space created by audio and text encoders. However, they suffered from heterogeneous modality representations, causing audio-text mismatch. To tackle this issue, our proposed framework utilizes knowledge from a pre-trained text-to-speech (TTS) system. This knowledge transfer incorporates awareness of audio projections into text representations derived from the text encoder. Consequently, this approach aids in preventing false triggers in scenarios such as closely related pronunciations of audio-text pairs. Additionally, our proposed approach benefits from the keyword embedding calculation only during keyword enrollment phase. The proposed system gives consistent performance across all the word lengths.