ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

NRI-FGSM: An Efficient Transferable Adversarial Attack for Speaker Recognition Systems

Hao Tan, Junjian Zhang, Huan Zhang, Le Wang, Yaguan Qian, Zhaoquan Gu

Deep neural network (DNN), though widely applied in Speaker Recognition Systems (SRS), is vulnerable to adversarial attacks which are hard to detect by humans. The black-box model vulnerability against adversarial attacks is crucial for the robustness of SRS, especially for the latest models such as x-vector and ECAPA-TDNN. The state-of-the-art transferable adversarial attack methods start with generating the adversarial audio from white-box SRS, then utilizing this audio to attack the black-box SRS. However, these methods often have a lower success rate in SRS than in the image processing domain. To improve the attack performance on SRS, we propose an efficient Nesterov accelerate and RMSProp optimization-based Iterative-Fast Gradient Sign Method (NRI-FGSM), which integrates the Nesterov Accelerated Gradient method and the Root Mean Squared Propagation optimization method with adaptive step size. Through extensive experiments on both closed-set speaker recognition (CSR) and open-set speaker recognition (OSR) tasks, our method achieves higher attack success rates of 97.8% for CSR and 61.9% for OSR tasks than others, and meanwhile maintains a lower perturbation rate with signal-to-noise ratio (SNR) and perceptual evaluation of speech quality (PESQ) metrics. It is worth mentioning that our work is the first to attack the ECAPA-TDNN SRS model successfully.