ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Defending Unauthorized Voice Cloning with Watermark-Aware Codecs

Jiankun Zhao, Lingwei Meng, Chengxi Deng, Helen Meng, Xixin Wu

The proliferation of zero-shot TTS models increases the risk of malicious voice cloning using copyrighted speech prompts. Although audio watermarking provides an effective way for encoding copyright information, attackers may still use watermarked speech as prompts to synthesize unwatermarked speech with the same speaker identity. To protect copyrighted voices from being cloned, this study introduces a method to train open-source TTS models to reject watermarked speech prompts for cloning. We observe that mainstream zero-shot TTS models typically rely on pre-trained codec encoders to process speech prompts. By training the codec to ``mute" when encountering watermarked audio, the quality of generated speech will degrade. In this way, we implicitly prevent zero-shot TTS models from cloning watermarked voices. Experiments show that our approach is robust against various attacks while maintaining high-quality TTS ability given unwatermarked speech prompts.