ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Adaptive Neural Network Quantization For Lightweight Speaker Verification

Haoyu Wang, Bei Liu, Yifei Wu, Yanmin Qian

Recently, speaker verification systems benefit from deep neural networks and the size of speaker embedding encoder increases with these sophisticated architectures. Nevertheless, mobile devices have inadequate memory for oversized embedding extractors, thus demanding compact networks. In this paper, we explore neural network quantization for model compression. Specifically, we first propose a novel uniform quantization method based on K-Means clustering. Then, to further improve the small model performance, mixed precision quantization is introduced. Besides, we implement a multi-stage fine-tuning (MSFT) recipe to boost the accuracy of mixed-precision model. In experiments, the performance degradation of 4 bit quantized ResNet34 is negligible. Our quantized models outperform former model compression methods in terms of size and accuracy. In addition, mixed-precision quantization with MSFT strategy further improves the model performance.