ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

A Cluster-based Personalized Federated Learning Strategy for End-to-End ASR of Dementia Patients

Wei-Tung Hsu, Chin-Po Chen, Yun-Shao Lin, Chi-Chun Lee

Automatic speech recognition (ASR) is crucial for all users, but adapting it for Alzheimer’s disease (AD) faces challenges due to irregular speech patterns and privacy concerns. Federated learning (FL), a privacy-preserving algorithm, is a solution. However, FL ASR suffers from acoustic and text heterogeneities. While advanced model-based and cluster-based FL methods aim to address the issue, they lack a direct mechanism for high intra-speaker heterogeneity exhibited by AD individuals and ASR-related properties. This study presents cluster-based personalized federated learning (CPFL), a strategy mitigating heterogeneity by clustering ASR output token using the proposed CharDiv, a metric for pause and word usage distributions. Evaluation on the ADReSS challenge dataset shows a 3.6% improvement in word error rate (WER). Analysis of per-cluster WER improvements and CharDiv distributions indicates reduced heterogeneity, emphasizing pause usage as a potential key factor in AD-oriented ASR.