ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Spoofed Speech Detection with a Focus on Speaker Embedding

Hoan My Tran, David Guennec, Philippe Martin, Aghilas Sini, Damien Lolive, Arnaud Delhay, Pierre-François Marteau

Self-Supervised Learning (SSL) models excel as feature extractors in downstream speech tasks, including the increasingly crucial area of spoof speech detection due to the rise of audio deepfakes using Text-To-Speech (TTS) and Voice Conversion (VC) technologies. To address this issue, we propose a novel approach that relies on speaker embedding using a finetuned WavLM model with layer-wise attentive statistics pooling combined to a supervised contrastive learning and cross-entropy loss. Evaluation on Logical Access (LA) and DeepFake (DF) tasks on ASVspoof 2019 and 2021 highlights its potential in detecting audio deepfakes, with the contrastive loss producing more stable results among test sets.