ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

DB-PMAE: Dual-Branch Prototypical Masked AutoEncoder with locality for domain robust speaker verification

Wei-lin Xie, Yu-Xuan Xi, Yan Song, Jian-tao Zhang, Hao-yu Song, Ian McLoughlin

Existing speaker verification (SV) systems mainly consist of a frontend deep embedding network pretrained for speaker identification(SID) followed by a backend network finetuned to provide a similarity measure. Despite their success, the performance may degrade remarkably due to domain mismatch. In this paper, we present a novel dual-branch prototypical masked autoencoder(DB-PMAE) based SRE framework. Specifically, the teacher and student branches with siamese encoders are pretrained to jointly learn patch-level features and prototypes. A multi-task learning framework is exploited for finetuning with SID and SV tasks, where the similarity is measured by finding local correspondence to improve domain robustness. Experiments on CNCeleb corpus demonstrate the superiority of DB-PMAE.