ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Adapter Learning from Pre-trained Model for Robust Spoof Speech Detection

Haochen Wu, Wu Guo, Shengyu Peng, Zhuhai Li, Jie Zhang

Speech anti-spoofing models can be improved by using large pre-trained model as front-end, e.g., Wav2vec2 or WavLM. However, apart from the heavy computation overhead, fine-tuning of pre-trained model is prone to over-fitting and catastrophic forgetting due to limited training data. In this paper, we propose an novel adapter learning framework based on pre-trained model for robust spoof speech detection. We consider two adapter cases, i.e., intra-block adapters and cross-block adapters, which are inserted or appended to the backbone Wav2vec2. The parameters of the adapters are updated by freezing the backbone during training. The local-global task-dependent information for spoof speech detection is obtained via the proposed adapter learning with a marginal increase of parameters. Results on three benchmark datasets validate the superiority over the baseline and existing SOTA systems.