ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Evaluating the Performance of State-of-the-Art ASR Systems on Non-Native English using Corpora with Extensive Language Background Variation

Samuel Hollands, Daniel Blackburn, Heidi Christensen

This investigation is an exploration into the performance of several different ASR systems in dealing with non-native English using corpora with extensive language background variation. This study takes two corpora amounting to 191 different native language (L1) backgrounds and looks at how these systems are able to process non-native English (L2) speech. A transformer based ASR system and a CRDNN architecture are both tested, trained on Librispeech and Commonvoice for a three way cross comparison. In addition Google's Speech-to-Text API and AWS Transcribe were investigated in order to evaluate popular mainstream approaches given their current degree of impact in deployed systems. Experiments reveal deficits in the range of 10%-15% mean WER performance difference between L1 and L2 speech. Results indicate ASR systems trained on particular varieties of L2 speech may be effective in improving WERs with outcomes in this paper demonstrating several Google ASR models trained on varieties of African L2 English outperforming L1 trained ASR for under-represented dialect groups in the United Kingdom. Further research is proposed to explore the plausibility of this approach and to critically approach WER as a metric for ASR evaluation, striving instead towards metrics with greater emphasis on evaluating language for communication.