ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Combining Simple but Novel Data Augmentation Methods for Improving Conformer ASR

Ronit Damania, Christopher Homan, Emily Prud'hommeaux

In this paper, we propose several novel data augmentation methods for improving the performance of automatic speech recognition (ASR) in low-resource settings. Using a 100-hour subset of English LibriSpeech to simulate a low-resource setting, we compare the well-known SpecAugment approach to these new methods, along with several other competitive baselines. We then apply the most promising combinations of models and augmentation methods to three genuinely under-resourced languages using the 40-hour Gujarati, Tamil, Telugu datasets from the 2021 Interspeech Low Resource Automatic Speech Recognition Challenge for Indian Languages. Our data augmentation approaches, coupled with state-of-the-art acoustic model architectures and language models, yield reductions in word error rate over SpecAugment and other competitive baselines for the LibriSpeech-100 dataset, showing a particular advantage over prior models for the ``other'', more challenging, dev and test sets. Extending this work to the low-resource Indian languages, we see large improvements over the baseline models and results comparable to large multilingual models.