The success of deep learning in speech and speaker recognition relies heavily on using large datasets. However, ethical, privacy and legal concerns arise when using large speech datasets collected from real human speech data. In particular, there are significant concerns when collecting many speaker's speech data from the web.
On the other hand, the quality of synthesized speech produced by recent generative models is very high. Can we 'generate' large, privacy-aware, unbiased, and fair datasets with speech-generative models? Such studies have started not only for speech datasets but also for facial image datasets.
In this talk, I will introduce our efforts to construct a synthetic VoxCeleb2 dataset called SynVox2 that is speaker-anonymised and privacy-aware. In addition to the procedures and methods used in the construction, the challenges and problems of using synthetic data will be discussed by showing the performance and fairness of a speaker verification system built using the SynVox2 database.