ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Multi-Path GMM-MobileNet Based on Attack Algorithms and Codecs for Synthetic Speech and Deepfake Detection

Yan Wen, Zhenchun Lei, Yingen Yang, Changhong Liu, Minglei Ma

The generalization ability of the speech spoofing detection system in real unseen sources is a great challenge. Spoofed speech from different attack algorithms or codecs has different feature distribution, which is the variant of the genuine speech. The conventional GMM describes the common distribution of all speech feature. But the GMM does not pay attention to the specificity of speech generated using an attack algorithm or codec, which may be useful to model the feature distribution of speech from unknown source. We propose the multi-path GMM-MobileNet model, which includes the GMMs trained on genuine and spoofed speech generated using various attack algorithms or codecs respectively. The 1-D variant of the MobileNet structure is used to extract embedding vector, and the multi-path structure is used to improve the generalization ability. On ASVspoof 2021 LA task, the M-GMM-MobileNet achieves a minimum t-DCF of 0.3231 and an EER of 6.80%, which relatively reduce by 6.2% and 26.6% compared with the LFCC-LCNN baseline. On the ASVspoof 2021 DF task, the M-GMM-MobileNet achieves an EER of 16.86%, which relatively reduce by 24.7% compared with the RawNet2 baseline. Compared with the systems on the ASVspoof 2021 DF leaderboard, our model is competitive.