ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Speaker Extraction with Detection of Presence and Absence of Target Speakers

Ke Zhang, Marvin Borsdorf, Zexu Pan, Haizhou Li, Yangjie Wei, Yi Wang

Target speaker extraction extracts a target voice from a given cocktail party mixture signal. Most studies are restricted to conditions in which the target speaker is present in the mixture (PT), which often fail when the target speaker is absent (AT). Training on both PT and AT situations helps, but degrades the PT performance as the model intrinsically tries to detect the target presence. We propose a new model, called TSEJoint, that jointly performs target speaker detection and extraction. Both tasks share the low-level modules, allowing the detection branch to use a pre-separated signal and keeping the overall processing pipeline length similar, while at the high-level they have different branches to ensure the performance of each task. We evaluate our proposed methods under PT and AT conditions comprising one and two talkers. The TSEJoint model shows better extraction performance under the PT condition and better detection performance on all conditions compared with the baseline.