ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

SIDC-KWS: Efficient Spiking Inception-Dilated Conformer with Self-Attention for Keyword Spotting

Jin Gyo Lim, Seong Eun Kim

Recent deep learning advances have improved keyword spotting (KWS). However, as KWS is deployed on edge devices, energy efficiency remains a key challenge. Conventional deep neural networks offer high accuracy but require heavy computation, making them unsuitable for low-power use. To address this, we propose the Spiking Inception-Dilated Conformer for Keyword Spotting (SIDC-KWS), an energy-efficient transformer based on spiking neural networks (SNNs). By integrating an Inception-Dilated (ID) block and spike-based self-attention, SIDC-KWS maintains high accuracy while significantly reducing power consumption. Experiments on the Google Speech Commands V2 (GSC V2) dataset show that SIDC-KWS achieves 96.8% and 94.7% accuracy on 12-class and 35-class tasks, respectively. On the 35-class task, SIDC-KWS consumes 75.59% less energy than its ANN counterpart. These results underscore SNNs as a scalable, low-power alternative for real-time KWS in resource-limited environments.