ISCA Archive SPSC 2025
ISCA Archive SPSC 2025

Defence Against the Deepfake Arts : Improving Audio Deepfake Detection With Context Awareness

Ivanina Ivanova, Abhay Dayal Mathur, Nicoline Nymand-Andersen, Nils Holzenberger
The increasing use of generative AI models to create realistic deepfakes poses significant challenges to information security, particularly in the realm of audio manipulation. Current audio deepfake detection methods focus on the acoustic signal and learn to spot artifacts produced by specific deepfake generators. This makes them inherently brittle when deployed to real-world datasets. We start from the intuitive observation that, in audio deepfakes, the person heard on the recording did not author the content of the utterance. This leads us to leverage speaker and author representations for speech and text. We introduce a novel multimodal approach, Defence Against the Deepfake Arts (DADA), involving two independent models for speech and text trained using contrastive learning, which then feed into a classifier fine-tuned for deepfake detection. We show empirically how our method robustly transfers to multiple methods of deepfake generation, setting a new state-of-the-art on the EER metric on multiple benchmarks.