ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Exploring Self-supervised Embeddings and Synthetic Data Augmentation for Robust Audio Deepfake Detection

Juan M. Martín-Doñas, Aitor Álvarez, Eros Rosello, Angel M. Gomez, Antonio M. Peinado

This work explores the performance of large speech self-supervised models as robust audio deepfake detectors. Despite the current trend of fine-tuning the upstream network, in this paper, we revisit the use of pre-trained models as feature extractors to adapt specialized downstream audio deepfake classifiers. The goal is to keep the general knowledge of the audio foundation model to extract discriminative features to feed up a simplified deepfake classifier. In addition, the generalization capabilities of the system are improved by augmenting the training corpora using additional synthetic data from different vocoder algorithms. This strategy is also complemented by various data augmentations covering challenging acoustic conditions. Our proposal is evaluated under different benchmark datasets for audio deepfake and anti-spoofing tasks, showing state-of-the-art performance. Furthermore, we analyze the relevant parts of the downstream classifier to achieve a robust system.