ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement

Zhao Yang, Dianwen Ng, Xizhe Li, Chong Zhang, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Jizhong Zhao, Bin Ma, Eng Siong Chng

Enabling continual learning (CL) from an ever-changing environment is highly valuable, but it poses significant challenges for spoken keyword spotting (KWS), which simultaneously deals with both variability in acoustic characteristics of speech signals and catastrophic forgetting issues. In this paper, we propose a novel framework for replay-based CL in KWS that uses a Dual-Memory Multi-Modal (DM3) structure to enhance generalizability and robustness. Our approach leverages short-term and long-term models to learn near-term and long-term knowledge in an adaptive manner with a dual-memory structure, while also exploiting the consistency of multiple speech perturbations to improve the robustness with a multi-modal structure. Additionally, we introduce a class-balanced selection strategy that uses confidence scores to sort training samples. Experiments demonstrate the effectiveness of our method over competitive baselines in class incremental learning and domain incremental learning KWS settings.