ISCA Archive ASVspoof 2024
ISCA Archive ASVspoof 2024

A single end-to-end voice anti-spoofing model with graph attention and feature aggregation for ASVspoof 5 Challenge

Weijiang Xia, Haipeng Peng, Lixiang Li, Yeqing Ren

In this paper, we submit the scores of our single model to the ASVspoof 5 Challenge Track 1 deepfake (DF) task under the closed condition. Voice anti-spoofing detection has always been an important research topic for protecting voice security. Therefore, in order to promote the development of voice anti-spoofing detection, the ASVspoof 5 organizing committee organized this challenge. Based on our previous research, we design a single end-to-end model with graph attention and feature aggregation (SEMAA). And we propose a new higher-order two-dimensional attentive statistics pooling (H2D-ASP) module to extract and aggregate more attention representations in spectral domain and temporal domain. We propose a new channel-dependent self attention based graph aggregation (CSA-GA) module, which squeezes and aggregates spectral graphs and temporal graphs. We use a variant of AASIST as backbone network, and the two proposed modules improves vanilla model by 29.75%. Our single model performs 18.28% better than the single baseline model B02 on minDCF of Eval set, achieves minDCF 0.2994, 0.3604, 0.581 on Dev set, Eval_prog set and Eval set.