ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Group GMM-ResNet for Detection of Synthetic Speech Attacks

Zhenchun Lei, Yan Wen, Yingen Yang, Changhong Liu, Minglei Ma

The CNN-based models have achieved a remarkable success for speaker recognition and spoofing speech detection. We propose the group GMM-ResNet for synthesis speech detection. The grouping technique is used to improve classification accuracy by exposing the group cardinality while reducing both the number of parameters and the training time. The grouping technique allows the model to jointly attend to information from different representation subspaces. We propose two grouping methods, which are based on the Gaussian components in GMM. And the GMM is trained using binary splitting method. On the ASVspoof 2021 LA task, the group GMM-ResNet achieves a minimum t-DCF of 0.2450 and an EER of 2.53%, which relatively reduces by 28.9% and 72.7% compared with the LFCC-LCNN baseline. On the ASVspoof 2021 DF task, the group GMM-ResNet achieves an EER of 15.96%, which relatively reduces by 28.7% compared with the RawNet2 baseline.