ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks

Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya

In this paper, we propose novel self-supervised speech representation learning method that obtains domain invariant representations by using a domain adversarial neural network. Recently, self-supervised representation learning has been actively studied in the speech field. Since self-supervised learning requires large-scale unlabeled data, we need to effectively use data collected from a variety of domains. However, existing methods cannot construct valid representations in unknown domains because they cause overfitting to the domains in the training data. To solve this problem, our proposed method constructs contextual representations that cannot identify the domains from input speech by using domain adversarial neural networks. The domain adversarial training can improve robustness for data in unknown domains because the model trained by our proposed method can construct domain invariant representations. In addition, we investigate multi-task learning of representation construction and domain classification to consider domain information. Experimental results show that our proposed method outperforms the conventional training method of wav2vec 2.0 in unknown domain downstream automatic speech recognition tasks.