ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Cooperative Speech Separation With a Microphone Array and Asynchronous Wearable Devices

Ryan Corey, Manan Mittal, Kanad Sarkar, Andrew C. Singer

We consider the problem of separating speech from several talkers in background noise using a fixed microphone array and a set of wearable devices. Wearable devices can provide reliable information about speech from their wearers, but they typically cannot be used directly for multichannel source separation due to network delay, sample rate offsets, and relative motion. Instead, the wearable microphone signals are used to compute the speech presence probability for each talker at each time-frequency index. Those parameters, which are robust against small sample rate offsets and relative motion, are used to track the second-order statistics of the speech sources and background noise. The fixed array then separates the speech signals using an adaptive linear time-varying multichannel Wiener filter. The proposed method is demonstrated using real-room recordings from three human talkers with binaural earbud microphones and an eight-microphone tabletop array.