Recently, speaker verification systems benefit from deep neural networks and the size of speaker embedding encoder increases with these sophisticated architectures. Nevertheless, mobile devices have inadequate memory for oversized embedding extractors, thus demanding compact networks. In this paper, we explore neural network quantization for model compression. Specifically, we first propose a novel uniform quantization method based on K-Means clustering. Then, to further improve the small model performance, mixed precision quantization is introduced. Besides, we implement a multi-stage fine-tuning (MSFT) recipe to boost the accuracy of mixed-precision model. In experiments, the performance degradation of 4 bit quantized ResNet34 is negligible. Our quantized models outperform former model compression methods in terms of size and accuracy. In addition, mixed-precision quantization with MSFT strategy further improves the model performance.