ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Thai Speech Spoofing Detection Dataset with Variations in Speaking Styles

Ticho Urai, Pachara Boonsarngsuk, Ekapol Chuangsuwanich

We develop the Chula Spoofed Speech (CSS) dataset, a spoofing dataset for Thai, which contains 1,332,120 utterances of both bona fide and synthetic speech. Synthetic speech samples were generated using five distinct high-quality text-to-speech (TTS) systems, all based on the same utterances as the bona fide data. The data covers various age ranges and speaking styles. Strong baselines such as AASIST and RawNet2 are trained under different conditions to uncover aspects that affect the performance of the models. Besides unseen attacks, unseen speaking styles also have a big impact on performance, indicating a need for diversity in speaking styles in anti-spoofing datasets. Furthermore, we investigate the models in telephony scenarios against additional TTS systems. The results reveal that the models still face certain challenges in this context.