ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

DyConvMixer: Dynamic Convolution Mixer Architecture for Open-Vocabulary Keyword Spotting

Waseem Gharbieh, Jinmiao Huang, Qianhui Wan, Han Suk Shim, Hyun Chul Lee

User-defined keyword spotting research has been gaining popularity in recent years. An open-vocabulary keyword spotting system with high accuracy and low power consumption remains a challenging problem. In this paper, we propose the DyConvMixer model for tackling the problem. By leveraging dynamic convolution alongside a convolutional equivalent of the MLPMixer architecture, we obtain and efficient and effective model that has less than 200K parameters and uses less than 11M MACs. Despite the fact that our model is less than half the size of state-of-the-art RNN and CNN models, it shows competitive results on the publicly available Hey-Snips and Hey-Snapdragon datasets. In addition, we discussed the importance of designing an effective evaluation system. We detailed our evaluation pipeline for comparison with future work.