ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Vietnam-Celeb: a large-scale dataset for Vietnamese speaker recognition

Viet Thanh Pham, Xuan Thai Hoa Nguyen, Vu Hoang, Thi Thu Trang Nguyen

The success of speaker recognition systems heavily depends on large training datasets collected under real-world conditions. While common languages like English or Chinese have vastly available datasets, low-resource ones like Vietnamese remain limited. This paper presents a large-scale spontaneous dataset gathered under noisy environments, with over 87,000 utterances from 1,000 Vietnamese speakers of many professions, covering 3 main Vietnamese dialects. To build the dataset, we propose a sophisticated construction pipeline that can also be applied to other languages, with efficient visual-aided processing techniques to boost data precision. With the state-of-the-art x-vector model, training with the proposed dataset shows an average absolute and relative EER improvement of 5.48% and 41.61% when compared to the model trained on VLSP 2021, a publicly available Vietnamese speaker dataset.