An Imperceptible Adversarial Watermarking to Prevent Voice Cloning
Seoyoung Park, Thien-Phuc Doan, Souhwan Jung
Recent advances in speech synthesis have enabled voice cloning from just a few seconds of audio, posing a serious threat to speaker verification systems. Although many deepfake detection methods have been proposed, they are inherently reactive and cannot prevent cloning itself. In contrast to such methods, adversarial defenses take a proactive approach by interfering with the voice representation process within speaker encoders. In this work, we build on the AntiFake framework and introduce key improvements: we incorporate perceptual constraints to preserve audio quality, propose an efficient target selection strategy, and design a loss function that balances proximity to the target embedding with separation from the original speaker identity. To validate the generality of our approach, we further extend our evaluation to recent zero-shot TTS models. Experiments demonstrate that our method provides effective protection against cloning with minimal perceptual degradation.