ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Whisper Features for Dysarthric Severity-Level Classification

Siddharth Rathod, Monil Charola, Akshat Vora, Yash Jogi, Hemant A. Patil

Dysarthria is a speech disorder caused by improper coordination between the brain and the muscles that produce intelligible speech. Accurately diagnosing the severity of dysarthria is critical for determining the appropriate treatment and directing speech to suitable Automatic Speech Recognition systems. Recently, various methods have been employed to investigate the classification of dysarthria severity-levels using advanced features, including STFT and MFCC. This study proposes utilizing Web-scale Supervised Pretraining for Speech Recognition (WSPSR), also known as Whisper, encoder module for dysarthric severity-level classification using transfer learning approach. Whisper model is an advanced machine learning model used for speech recognition, which is trained on a large scale of 680,000 hours of labeled audio data. The proposed approach demonstrated a high accuracy rate of 98.02%, surpassing the accuracies achieved by MFCC (95.2%) and LFCC (96.05%).