ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Automatic Labeling and Correction of Noisy Labels for Robust Self-Supervised Speaker Verification

Abderrahim Fathan, Jahangir Alam

Supervised speaker verification relies on large labeled datasets, which are costly and labor-intensive to create. However, both manual and clustering-based labeling methods introduce label noise, degrading model generalization. To leverage unlabeled speech data, we propose a framework that automatically generates and refines pseudo speaker labels. It first generates pseudo-labels using a clustering algorithm, then trains a speaker verification system to boost the quality of pseudo-labeled data using self-supervised learning and a neural embedding extractor optimized with refined loss function. This function integrates a dynamic and adaptive label noise cleansing method, termed AdaptiveDropSC, which tracks dominant sub-centers via a dictionary table for better label correction. Experiments on VoxCeleb corpus show that our method improves pseudo-labeling accuracy across different clustering techniques, achieving state-of-the-art performance in self-supervised speaker verification.