ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Improved ASR Performance for Dysarthric Speech Using Two-stage DataAugmentation

Chitralekha Bhat, Ashish Panda, Helmer Strik

Machine learning (ML) and Deep Neural Networks (DNN) have greatly aided the problem of Automatic Speech Recognition (ASR). However, accurate ASR for dysarthric speech remains a serious challenge. Dearth of usable data remains a problem in applying ML and DNN techniques for dysarthric speech recognition. In the current research, we address this challenge using a novel two-stage data augmentation scheme, a combination of static and dynamic data augmentation techniques that are designed by leveraging an understanding of the characteristics of dysarthric speech. Deep Autoencoder (DAE)-based healthy speech modification and various perturbations comprise static augmentations, whereas SpecAugment techniques modified to specifically augment dysarthric speech comprise the dynamic data augmentation. The objective of this work is to improve the ASR performance for dysarthric speech using the two-stage data augmentation scheme. An end-to-end ASR using a Transformer acoustic model is used to evaluate the data augmentation scheme on speech from the UA dysarthric speech corpus. We achieve an absolute improvement of 16% in word error rate (WER) over a baseline with no augmentation, with a final WER of 20.6%.