ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Voice Quality Variation in AAE: An Additional Challenge for Addressing Bias in ASR Models?

Li-Fang Lai, Nicole Holliday

Creaky voice, a non-modal phonation type often stigmatized in the U.S. media, has become increasingly prevalent in the speech of young Americans across ethnic and regional groups. This paper aims to add to our knowledge of voice quality variation and how it interacts with ASR, by conducting three analyses using a new African American English (AAE) dataset. Acoustic analyses show robust differences between creaky voice and modal voice, suggesting cross-ethnic similarity in vocal fold articulation between AAE and Mainstream American English (MAE) speakers. In addition, we observed gender differences in creaky production both quantitatively (women > men) and qualitatively (women: medial partial creaks vs. men: final full creaks). This indicates that young AAE female speakers are participating in the phonation change taking place in MAE. We also found that the creakier the speech, the more errors in ASR output, suggesting the importance of incorporating voice quality into ASR systems.