ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

ReSepNet: A Unified-Light Model for Recursive Speech Separation with Unknown Speaker Count

Hadi Alizadeh, Rahil Mahdian Toroghi, Hassan Zareian

Single-channel speech separation remains a significant challenge, particularly when the number of concurrent speakers is unknown. Existing methods often rely on prior assumptions about speaker count, limiting their real-world applicability. This paper introduces ReSepNet (Recursive Separation Network), a novel, unified model for speaker-independent speech separation that dynamically adapts to an unknown number of speakers. ReSepNet employs a recursive separation architecture, enabling it to iteratively isolate individual voices without prior knowledge of the speaker count. To enhance efficiency and reduce model complexity, we introduce a novel objective function and workflow. We demonstrate the effectiveness of ReSepNet on the WSJ0 datasets, achieving state-of-the-art separation performance and accurate speaker count estimation. Furthermore, ReSepNet generalizes well to mixtures containing four and five speakers, showcasing its robustness and adaptability to challenging scenarios.