ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Fooling Speaker Identification Systems with Adversarial Background Music

Chu-Xiao Zuo, Jia-Yi Leng, Wu-Jun Li

Speaker identification (SI) systems are widely used in real-world scenarios but are vulnerable to attacks from malicious users. Although existing attacks mainly focus on speech-shaped inputs, SI models can also be broken by speech-unrelated background music (BGM) in practical use. In this paper, we propose a new attack, called BGM Attack (BGMA), that generates auditorily natural music to deceive SI models. BGMA integrates a music generation model and a SI model to modify the music-level semantic features. We propose a linear transform called differentiable spectrogram reconstruction (DSR) that acts as a bridge for conveying gradient information between the two models in BGMA. Our experiments show that BGMA can effectively break state-of-the-art SI models with generated auditorily natural music. The result of this paper highlights the need for SI models to be robust against attacks from non-speech inputs and provides a novel attack method for testing the security of SI systems.