ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Self-Supervised Speaker Verification with Mini-Batch Prediction Correction

Junxu Wang, Zhihua Fang, Liang He

Applying self-supervised learning to speaker verification tasks has been a challenge. In the two-stage solution, the clustering-iteration step in stage 2 determines the upper bound of the system. Since the pseudo-labels obtained through clustering contain a lot of noise, in order to deal with them, in this paper, we propose a new method for learning with noisy pseudo-labels focusing on small batches, using a unified alignment method based on the model predicted mean and exponential moving average to determine the samples that can be rectified in noisy pseudo-labels. In addition, we explore different iterative training methods, and propose a training method that takes into account the effects of re-clustering and noisy pseudo-labels. By combining these techniques, our system achieves similar or better results compared with previous studies.