ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Aligning Speech Enhancement for Improving Downstream Classification Performance

Yan Xiong, Visar Berisha, Chaitali Chakrabarti

Speech-based classification models in the cloud are gaining large-scale adoption. In many applications where post-deployment background noise conditions mismatch those used during model training, fine-tuning the original model on local data would likely improve performance. However, this is not always possible as the local user may not be authorized to modify the cloud-based model or the local user may be unable to share the data and corresponding labels required for fine-tuning. In this paper, we propose a denoiser stored locally on edge devices with an application-specific training scheme. It learns a custom speech enhancement scheme that aligns the local denoiser with the downstream model, without requiring access to the cloud-based weights. We evaluate the denoiser with a common classification task - keyword spotting - and demonstrate using two different architectures that the proposed scheme outperforms common speech enhancement models for different types of background noise.