ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Turing's Echo: Investigating Linguistic Sensitivity of Deepfake Voice Detection via Gamification

Binh Nguyen, Thai Le

Recent advancements in text-to-speech technology have made it easier than ever to synthesize high-fidelity, human-like audio, commonly coined as deepfake speech. While this technology is beneficial to many applications, it has introduced significant risks, especially in enabling hyper-realistic impersonation threats. Existing research has developed robust algorithms capable of detecting deepfake speech under various acoustic conditions, such as pitch shift and background noise. However, the indirect impact of linguistic factors, such as word choice, grammar, and sentence structure of a deepfake speech's transcript, on the performance of deepfake detectors remains unexplored. As the first step to bridge this gap, this paper introduces a gamified research prototype, called TURING'S ECHO, to evaluate (1) how humans perceive such linguistic sensitivity in comparison with machine and (2) how robust is state-of-the-art deepfake speech detection under different linguistic nuances.