ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

A Domain Robust Pre-Training Method with Local Prototypes for Speaker Verification

Qing Gu, Yan Song, Haoyu Song, Nan Jiang, Lirong Dai, Ian McLoughlin

Existing self-supervised methods for speaker verification (SV) have demonstrated strong potential by training on large-scale unlabeled speech data to learn effective speaker embeddings. However, most rely on utterance-level contrastive learning or self-distillation, which fails to adequately account for domain shifts caused by different styles and languages. In this paper, we propose a novel domain-robust pre-training method with local prototypes for SV. Specifically, we employ a transformer-based encoder to introduce a self-distillation framework for local feature learning. Domain-agnostic pre-training is used to derive local prototypes through online clustering. Furthermore, domain-aware alignment is applied to learn domain-robust local features. Fine-tuning with utterance-level supervision demonstrates the effectiveness of our proposed method on the CNCeleb and VoxCeleb benchmarks.