ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Exploring Sources of Racial Bias in Automatic Speech Recognition through the Lens of Rhythmic Variation

Li-Fang Lai, Nicole Holliday

Although studies have shown that one issue of bias in modern automatic speech recognition (ASR) technologies is degraded performance for African American English (AAE) speakers, the mechanism by which systems fail for AAE speakers is still not well-understood. The present study aims to offer insight into this issue by examining whether errors are driven by rhythmic variation in ethnolects. We computed seven quantitative measures of speech rhythm in a reading task as produced by AAE and General American English (GAE) speakers and related these metrics to word error rates. The results confirmed racial bias against AAE speakers with higher error rates when AAE speakers produced more variable durations in vowel sounds. Rhythmic variation, on the other hand, is not a contributing factor for the errors in GAE. The result calls for interdisciplinary collaboration between linguists and ASR builders to add timing components of speech to the system to ensure fairness in artificial intelligence for currently underserved groups.