ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Reducing Speech Distortion and Artifacts for Speech Enhancement by Loss Function

Haixin Guan, Wei Dai, Guangyong Wang, Xiaobin Tan, Peng Li, Jiaen Liang

Deep learning-based speech enhancement has made significant strides. However, challenges such as speech distortion and artifacts persist. These issues can diminish perceived auditory quality and the accuracy of speech recognition systems, particularly when employing lightweight models. Therefore, this paper investigates the underlying principles governing the formation of speech distortion and artifacts, and introduces a novel combined loss function that integrates Voice Activity Detection (VAD) information and speech continuity to solve the problem. Additionally, a new training strategy is designed based on the proposed loss function to address the difficulty of training this combined loss on extremely small models. Experiments validate the effectiveness of our approach on the DNS2020 dataset and real meeting data in enhancing both subjective and objective speech metrics, as well as Automatic Speech Recognition (ASR) performance.