ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

HABLA: A Dataset of Latin American Spanish Accents for Voice Anti-spoofing

Pablo Andrés Tamayo Flórez, Rubén Manrique, Bernardo Pereira Nunes

Research on improving automatic speaker verification systems to detect speech spoofing has focused mainly on English, with little attention given to other languages creating a significant gap in language coverage. This paper introduces HABLA, the first voice anti-spoofing dataset in the Spanish language including Argentinian, Colombian, Peruvian, Venezuelan, and Chilean accents. The dataset provided by HABLA comprises over 22,000 authentic speech samples from male and female speakers hailing from five distinct Latin American nations as well as 58,000 spoof samples that were generated through the use of six different speech synthesis strategies, including recent voice conversion and text-to-speech algorithms. Finally, initial findings on the efficacy of pre-existing Antispoofing Systems models are presented along with concerns regarding their performance in languages other than English.