ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Knowledge of accent differences can be used to predict speech recognition

Tuende Szalay, Mostafa Shahin, Beena Ahmed, Kirrie Ballard

If accent differences can predict speech recognition, a smaller dataset systematically representing accent differences might be sufficient and less resource intensive for adapting an automatic speech recognition (ASR) to a novel variety compared to training the ASR on a large, unsystematic dataset. However, it is not known whether ASR errors pattern according to accent differences. Therefore, we tested the performance of Google's General American (GenAm) and Standard Australian English (SAusE) ASR on both dialects using words systematically representing accent differences. Accent differences were quantified using the different number of vowel phonemes, the different phonetic quality of vowels, and differences in rhoticity (i.e., presence/absence of postvocalic /r/). Our results confirm that word recognition is significantly more accurate when ASR dialect matches the speaker dialect compared to the mismatched conditions. Our results reveal that GenAm ASR is less accurate on SAusE speakers due to the higher number of vowel phonemes and to the lack of post-vocalic /r/ in SAusE. Thus, the data need of adapting ASR from GenAm to SAusE might be reduced by using a small dataset focusing on differences in the size of vowel inventory and in rhoticity.