ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Noise-Robust Bandwidth Expansion for 8K Speech Recordings

Yin-Tse Lin, Bo-Hao Su, Chi-Han Lin, Shih-Chan Kuo, Jyh-Shing Roger Jang, Chi-Chun Lee

Speech recordings in call centers are narrowband and mixed with various noises. Developing a bandwidth expansion (BWE) model is important to mitigate the automated speech recognition (ASR) performance gap between the low and high sampling rate speech data. To further address the in-the-wild noise in call center settings, we propose an Embedding-Polished Wave-U-Net (EP-WUN) that includes an additional speech quality classifier to handle the noise and bandwidth expansion of 8k audio simultaneously. Our framework shows improved speech quality metrics on a well-known BWE dataset (Valentini-Botinhao corpus) when comparing to the current state-of-the-art noise-robust BWE model with 33% fewer parameters. It also achieves an 11.71% word error rate reduction when evaluating on a real-world interactive voice response system from the E.SUN bank.