ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

ResectNet: An Efficient Architecture for Voice Activity Detection on Mobile Devices

Okan Köpüklü, Maja Taseska

We present ResectNet, a RESource Efficient and CompacT Convolutional Recurrent Neural Network architecture for Voice Activity Detection (VAD) on mobile devices, which achieves state-of-the-art performance with less than 12k parameters. ResectNet operates on raw audio signals and consists of sinc convolutions, depthwise convolutions, grouped pointwise convolutions, frequency shift module and a gated recurrent unit. We propose a simple width-multiplier hyperparameter, which allows scaling ResectNet for the desired trade-off between efficiency and performance. We present a detailed ablation study on resource and performance trade-offs on the VAD task.