ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Self-Supervised Dataset Pruning for Efficient Training in Audio Anti-spoofing

Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza

The computational cost for training neural anti-spoofing models has rapidly increased due to larger network architectures. Several dataset-pruning metrics have been proposed to increase the training efficiency of these models. However, these metrics require example labels and an initial training step to compute example scores which is computationally intensive. We propose a novel self-supervised pruning metric for efficient dataset pruning in neural anti-spoofing models. Our method identifies important examples and prunes the dataset in an efficient, self-supervised manner using the clustered embedding representation of audios. We demonstrate that our method exceeds the performance of four other pruning metrics on the ASVSpoof 2019 dataset across two anti-spoofing models while being 91% computationally more efficient. We also find differences in the distribution of certain attacks, which helps explain the better performance of self-supervised pruning over other metrics.