ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Unsupervised Training of Sequential Neural Beamformer Using Coarsely-separated and Non-separated Signals

Kohei Saijo, Tetsuji Ogawa

We present an unsupervised training method of the sequential neural beamformer (Seq-BF) using coarsely-separated and non-separated supervisory signals. The signal coarsely separated by blind source separation (BSS) has been used for training neural separators in an unsupervised manner. However, the performance is limited due to distortions in the supervision. In contrast, remix-cycle-consistent learning (RCCL) enables a separator to be trained on distortion-free observed mixtures by making the remixed mixtures obtained by repeatedly separating and remixing the two different mixtures closer to the original mixtures. Still, training with RCCL from scratch often falls into a trivial solution, i.e., not separating signals. The present study provides a novel unsupervised learning algorithm for the Seq-BF with two stacked neural separators, in which the separators are pre-trained using the BSS outputs and then fine-tuned with RCCL. Such configuration compensates for the shortcomings of both approaches: the guiding mechanism in Seq-BF accelerates separation to exceed BSS performance, thereby stabilizing RCCL. Experimental comparisons demonstrated that the proposed unsupervised learning achieved performance comparable to supervised learning (0.4 point difference in word error rate).