ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Perceptual Characteristics Based Multi-objective Model for Speech Enhancement

Chiang-Jen Peng, Yun-Ju Chan, Yih-Liang Shen, Cheng Yu, Yu Tsao, Tai-Shih Chi

Deep learning has been widely adopted for speech applications. Many studies have shown that using the multiple objective framework and learned deep features is effective for improving system performance. In this paper, we propose a perceptual characteristics based multi-objective speech enhancement (SE) algorithm that combines the conventional loss and objective losses of pitch and timbre related features. Timbre related features include frequency modulation (encoded by the pitch contour), amplitude modulation (encoded by the energy contour), and speaker identity. For the speaker identity loss, we consider the deep features derived in a speaker identification system. The proposed algorithm consists of two parts, a LSTM based SE model and CNN based multi-objective models. The objective losses are derived between speech enhanced by the SE model and clean speech and combined with the SE loss for updating the SE model. The proposed algorithm is evaluated using the corpus of Taiwan Mandarin hearing in noise test (TMHINT). Experimental results show the proposed algorithm evidently outperforms the original SE model in all objective scores, including speech quality, speech intelligibility and signal distortion.