ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Multi-mode Neural Speech Coding Based on Deep Generative Networks

Wei Xiao, Wenzhe Liu, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su, Shidong Shang, Dong Yu

The wideband or super wideband speech is one of the most prominent features in real-time communication services, with higher resolution spectrum. However, it requires higher computing expenses. In this paper, we introduce the Penguins codec, based on a multi-mode neural speech coding structure that combines sub-band speech processing and applies different strategies from the low band to the high band. Especially, it refers to deep generative networks with perceptual constraint loss functions and knowledge distillations to reconstruct wideband components and bandwidth extension to generate artificial super wideband components. The method results in high-quality speech at very low bitrates. Several subjective and objective experiments, including ablation studies, were organized, and the results proved the merit of the proposed scheme when compared with traditional coding schemes and state-of-the-art neural coding methods.