ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

SPCODEC: Split and Prediction for Neural Speech Codec

Liang Wen, Lizhong Wang, Yuxing Zheng, Weijing Shi, Kwang Pyo Choi

Recent advancements in time-domain end-to-end neural speech codecs have significantly improved performance. However, existing codecs fail to fully exploit the correlations across different frequency bands in speech, leading to inefficiencies and reduced interpretability. In this paper, we introduce SPCODEC, a time-domain end-to-end neural speech codec featuring a latent split-and-prediction scheme. The model consists of a fully convolutional encoder-decoder and a group residual vector quantization module enhanced with a split-and-prediction mechanism. This mechanism disentangles low- and high-frequency representations and employs prediction to effectively reduce feature redundancy. SPCODEC achieves state-of-the-art MOS-POLQA scores of 4.0 at 6/8 kbps and 4.5 at 10.66/16 kbps for wideband and super-wideband speech, significantly outperforming both neural and traditional codecs.