ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Enhancing Audio Deepfake Detection by Improving Representation Similarity of Bonafide Speech

Seung-bin Kim, Hyun-seo Shin, Jungwoo Heo, Chan-yeong Lim, Kyo-Won Koo, Jisoo Son, Sanghyun Hong, Souhwan Jung, Ha-Jin Yu

The key to audio deepfake detection is distinguishing bonafide speech from carefully generated spoofed speech. The more distinguishable they are, the better and more generalizable the detection becomes. In this work, we propose a novel approach to enhance this distinguishability in the latent space. Inspired by one-class classification, we formulate an objective function that encourages the contraction of bonafide samples while dispersing fake speech samples during training. Our objective consists of two key components: Bonafide-Pair Learning (BPL) loss and an Extended One-Class Softmax (EOC-S) loss. The BPL reduces intra-class variance by aligning the embeddings of augmented bonafide pairs, while the EOC-S leverages Adam-based centroid updates and margin constraints to reinforce separability from spoofed data. Experimental results on ASVspoof datasets demonstrate that our proposed approach enhances detection performance across diverse attack scenarios.