ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

Cheng Yu, Szu-wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers. In this study, we propose a novel meta-learning-based speaker-adaptive SE approach (called OSSEM) that aims to achieve SE model adaptation in a one-shot manner. OOSSEM consists of a modified transformer SE network and a speaker specific masking (SSM) network. In practice, the SSM network uses enrolled speaker embeddings extracted using ECAPA-TDNN to adjust input features through masking. To evaluate OSSEM, we design a modified Voice Bank-DEMAND dataset containing the first noisy utterances from speakers in the test set for model adaptation and the remaining utterances for testing performance. Furthermore, we set the constraints to be able to perform the SE process in real time, thereby designing OSSEM as a causal SE system. The experimental results first show that OSSEM can effectively adapt the SE model to a specific speaker using only one of his/her noisy utterances, thereby improving SE results. Meanwhile, OSSEM exhibits competitive performance compared to state-of-the-art causal SE systems.