ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Overlap Aware Continuous Speech Separation without Permutation Invariant Training

Linfeng Yu, Wangyou Zhang, Chenda Li, Yanmin Qian

Continuous speech separation (CSS) aims to separate a long-form signal with multiple partially overlapped utterances into a set of non-overlapped speech signals. While most existing CSS methods rely on the permutation invariant training (PIT) algorithm for training and inference, we argue that one may not need PIT at all to achieve promising CSS performance. In this paper, we propose a novel overlap aware CSS method, which explicitly identifies the non-overlapped segments in the long-form input to guide the separation of overlapped segments. We show that with the help of an external overlapping speech detection (OSD) model, an overlap-aware CSS model can be trained without PIT. In addition, an overlap-aware inference algorithm is proposed to greatly reduce the computational cost while preserving strong performance. Experiment results show that our proposed methods outperform the conventional stitching-based CSS approach, with over 1 dB signal-to-noise ratio (SNR) improvement.