ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

RW-VoiceShield: Raw Waveform-based Adversarial Attack on One-shot Voice Conversion

Ching-Yu Yang, Shreya G. Upadhyay, Ya-Tse Wu, Bo-Hao Su, Chi-Chun Lee

In recent years, there have been significant advancements in one-shot voice conversion (VC), enabling the alteration of speaker traits with just a single sentence. However, as this technology matures and generates increasingly realistic utterances, it becomes vulnerable to privacy concerns. In this paper, we propose RW-VoiceShield to shield voice from replication. This is achieved by effectively attacking one-shot VC models through the application of imperceptible noise generated from a raw waveform-based generative model. Our method undergoes testing using the latest one-shot VC model, conducting subjective and objective evaluations under both black-box and white-box scenarios. Our results indicate significant disparities in speaker characteristics between the utterances generated by the VC model and those of the protected speaker. Furthermore, even with adversarial noise introduced to protected utterances, the speaker’s distinct characteristics remain recognizable.