ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection

Yan Xiong, Visar Berisha, Julie Liss, Chaitali Chakrabarti

Speech analytic models based on deep learning are popular in clinical diagnostics. However, constraints on clinical data collection and sharing place limits on available dataset sizes, which adversely impacts trained model performance. Multi-task learning (MTL) has been utilized to mitigate the effect of limited sample size by jointly training on multiple tasks that are considered to be related. However, discrepancies between clinical and non-clinical tasks can reduce MTL efficiency and can even cause it to fail, especially when there are gradient conflicts. In this paper, we enhance the performance of dysarthria detection by using MTL with an auxiliary task of learning speaker embeddings. We propose a task-specific gradient projection method to overcome gradient conflicts. Our evaluation shows that the proposed MTL paradigm outperforms both single-task learning and conventional MTL under different data availability settings.