ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Keyword Spotting with Synthetic Data using Heterogeneous Knowledge Distillation

Yuna Lee, Seung Jun Baek

It is crucial that Keyword Spotting (KWS) systems learn to understand new classes of user-defined keywords, which however is a challenging task requiring high-quality audio datasets. We propose KWS with Heterogeneous Embedding Knowledge Distillation (HEKD) which uses only synthetic data of unseen keyword classes. In HEKD, a reference model transfers the heterogeneous knowledge on seen classes to the student model for classifying keywords of unseen classes. By mimicking the embedding function of reference model trained on real data via a contrastive learning approach, we show that student model can learn to discriminate unseen keyword classes guided by synthetic data. In addition, we propose to maximize the dispersion of embedding clusters of unseen keywords with approximation guarantees in order to enhance the inter-class variability. Experiments show that HEKD outperforms baseline schemes using few-shot learning and those pre-trained on a large volume of data, demonstrating its effectiveness and efficiency.