ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

ClearerVoice-Studio: Bridging Advanced Speech Processing Research and Practical Deployment

Shengkui Zhao, Zexu Pan, Bin Ma

This paper introduces ClearerVoice-Studio, an open-source, AI-powered speech processing toolkit designed to bridge cutting-edge research and practical application. Unlike broad platforms like SpeechBrain and ESPnet, ClearerVoice-Studio focuses on interconnected speech tasks of speech enhancement, separation, super-resolution, and multimodal target speaker extraction. A key advantage is its state-of-the-art pretrained models, including FRCRN (3M+ uses) and MossFormer (2.5M+ uses), optimized for real-world scenarios. It also offers model optimization tools, multi-format audio support, the SpeechScore evaluation toolkit, and user-friendly interfaces, catering to researchers, developers, and end-users. Its rapid adoption (2.8K GitHub stars, 200+ forks) highlights its academic and industrial impact. This paper details ClearerVoice-Studio’s capabilities, architectures, training strategies, benchmarks, community impact, and future plan.