ISCA Archive SPSC 2025
ISCA Archive SPSC 2025

Optimizing the Dataset for the Privacy Evaluation of Speaker Anonymizers

Carlos Franzreb, Arnab Das, Tim Polzehl, Sebastian Möller
Evaluating speaker anonymizers requires simulating an attack to identify anonymized speakers, as there is no ground truth to evaluate against. Its reliability depends on the data used to train and evaluate the attack. Simultaneously, the design of anonymizers would benefit from faster evaluations, whose run-time is mostly dependent on the size of the training data. Hence the question: how much data is required by the attacker to reliably and efficiently estimate an anonymizer’s privacy? Considering four diverse anonymizers, we first experiment with different sizes and configurations for the evaluation dataset. Our results highlight the attacker’s robustness: its performance does not degrade when more speakers are evaluated. Once we have defined a reliable evaluation, the second part of our study aims to improve its runtime by reducing the size of the training data. We show that 20% of the training data can be discarded with a minimal degradation of the attack’s efficacy.