ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

How Does Label Noise Affect the Quality of Speaker Embeddings?

Minh Pham, Zeqian Li, Jacob Whitehill

A common assumption when collecting speech datasets is that the accuracy of data labels strongly influences the accuracy of speaker embedding models and verification systems trained from these data. However, we show in experiments1 on the large and diverse VoxCeleb2 dataset that this is not always the case: Under four different labeling models (Split, Merge, Permute, and Corrupt), we find that the impact on trained speaker embedding models, as measured by the Equal Error Rate (EER) of speaker verification, is mild (just a few percent absolute error increase), except with very large amounts of noise (i.e., every minibatch is almost completely corrupted). This suggests that efforts to collect speech datasets might benefit more from ensuring large size and diversity rather than meticulous labeling.