ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Joint Optimization of the Module and Sign of the Spectral Real Part Based on CRN for Speech Denoising.

Zilu Guo, Xu Xu, Zhongfu Ye

Recently some novel techniques have utilized sophisticated algorithms to correct phase or use phase information by processing real- and image-part respectively or simultaneously in the STFT domain. However, neural networks can not process a complex-valued feature, i.e., the STFT of a noisy speech. Therefore, these methods estimating the STFT of a clean signal can only obtain sub-optimal solutions. To avoid tackling complex-value operations, we formulate that only the real part of 2K-point STFT is utilized as the feature that holds all signal information. Therefore, speech enhancement in the STFT domain turns into a real-valued task. However, it is hard for the network to estimate the correct sign. Consequently, we develop an estimator to predict the real part sign and a decoder to estimate the targeted real part's mask. Then, we devise some experiments to evaluate our model over kinds of metrics. The results indicate that our model outperforms several state-of-the-art (SOTA) models.