ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Federated Domain Adaptation for ASR with Full Self-Supervision

Junteng Jia, Jay Mahadeokar, Weiyi Zheng, Yuan Shangguan, Ozlem Kalinli, Frank Seide

Cross-device federated learning (FL) protects user privacy by collaboratively training a model on user devices, therefore eliminating the need for collecting, storing, and manually labeling user data. Previous works have considered cross-device FL for automatic speech recognition (ASR), however, there are a few important challenges that havenot been fully addressed. These include the lack of ground-truth ASR transcriptions, and the scarcity of compute resource and network bandwidth on edge devices. In this paper, we address these two challenges. First, we propose a federated learning system to support on-device ASR adaptation with full self-supervision, which uses self-labeling together with data augmentation and filtering techniques. The proposed system can improve a strong Emformer-Transducer based ASR model pretrained on out-of-domain data, using in-domain audios without any ground-truth transcriptions. Second, to reduce the training cost, we propose a self-restricted RNN Transducer (SR-RNN-T) loss, a new variant of alignment-restricted RNN-T that uses Viterbi forced-alignment from self-supervision. To further reduce the compute and network cost, we systematically explore adapting only a subset of weights in the Emformer-Transducer. Our best training recipe achieves a 12.9% relative WER reduction over the strong out-of-domain baseline, which equals 70% of the reduction achievable with full human supervision and centralized training.

doi: 10.21437/Interspeech.2022-803

Cite as: Jia, J., Mahadeokar, J., Zheng, W., Shangguan, Y., Kalinli, O., Seide, F. (2022) Federated Domain Adaptation for ASR with Full Self-Supervision. Proc. Interspeech 2022, 536-540, doi: 10.21437/Interspeech.2022-803

  author={Junteng Jia and Jay Mahadeokar and Weiyi Zheng and Yuan Shangguan and Ozlem Kalinli and Frank Seide},
  title={{Federated Domain Adaptation for ASR with Full Self-Supervision}},
  booktitle={Proc. Interspeech 2022},