ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Quadruple Path Modeling with Latent Feature Transfer for Permutation-free Continuous Speech Separation

Jihyun Kim, Doyeon Kim, Hyewon Han, Jinyoung Lee, Jonguk Yoo, Chang Woo Han, Jeongook Song, Hoon-Young Cho, Hong-Goo Kang

This paper proposes Quadruple Path Modeling (QPM), a permutation-free and generalized continuous speech separation (CSS) model designed to handle varying speaker conditions and efficiently address the permutation problem in chunk-based streaming scenarios. QPM integrates intra-chunk feature modeling, inter-speaker and inter-chunk processing, and latent feature transfer (LFT) modules to enhance separation performance while ensuring speaker consistency across segments. By leveraging a memory-based inter-chunk mechanism and a learnable gating strategy, QPM effectively propagates relevant speaker information across segments, thus reducing speaker permutation errors in streaming CSS tasks. Designed for lightweight and low-latency applications, including live streaming, QPM demonstrates strong performance using a 2-second chunk size. Experimental results confirm the efficacy of the proposed system in resolving the permutation problem, offering a scalable and adaptable solution for CSS.