ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Multi-Channel Extension of Pre-trained Models for Speaker Verification

Ladislav Mošner, Romain Serizel, Lukáš Burget, Oldřich Plchot, Emmanuel Vincent, Junyi Peng, Jan Černocký

In this work, we focus on designing a multi-channel speech processing system based on large pre-trained models. These models are typically trained for single-channel scenarios via self-supervised learning (SSL). A common approach to using the SSL models with microphone array data is to prepend it with a multi-channel speech enhancement. The downside is that spatial information can be leveraged only by the pre-processing stage, and enhancement errors get propagated to the SSL model. We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per-channel processing with cross-channel information exchange, eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.