ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

DGSRN: Noise-Robust Speech Recognition Method with Dual-Path Gated Spectral Refinement Network

Wenjun Wang, Shangbin Mo, Ling Dong, Zhengtao Yu, Junjun Guo, Yuxin Huang

The advancements in speech recognition have led to significant progress in predicting clean speech. However, challenges persist in real-world noisy environments. Addressing issues such as speech distortion and noise residue in signals processed by speech enhancement models, we propose a noise-robust speech recognition method based on the Dual-Path Gated Spectral Refinement Network (DGSRN). We construct a single-channel speech enhancement model based on dense time-frequency convolutional networks for the first stage of noise suppression. And the Dual-Path Gated Spectral Refinement Network is designed to extract useful features from estimated noise to enhance speech quality. Multi-task joint training is conducted using a weighted speech distortion loss function. Experimental results demonstrate that compared to traditional joint training approaches, DGSRN achieves a 12.41% reduction in Character Error Rate, addressing the issue of mismatched performance on evaluation metrics.