Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR

Miwa, Shogo; Kai, Atsuhiko

doi:10.21437/Interspeech.2023-2463

Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR

Shogo Miwa, Atsuhiko Kai

In order to utilize the large amount of historical speech resources for applications such as linguistic analysis and retrieval, automatic speech recognition technology that can handle a variety of dialects is required. Although there are many dialects in the Japanese language, there have been no reports of speech recognition models that cover almost all Japanese dialects using only shared dialect resources. This paper presents a baseline for dialect speech recognition of spoken Japanese using a nationwide corpus of Japanese dialects released in 2022. Specifically, the paper presents results on: 1) the effectiveness of adapting a self-supervised learning model, which has been shown to be effective for low-resource languages, to the dialect corpus; 2) the effectiveness of combining both automatic speech recognition and dialect region identification tasks, or when used in conjunction with a large-scale corpus of standard Japanese, within the framework of self-supervised learning.

doi: 10.21437/Interspeech.2023-2463

Cite as: Miwa, S., Kai, A. (2023) Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR. Proc. INTERSPEECH 2023, 4928-4932, doi: 10.21437/Interspeech.2023-2463

@inproceedings{miwa23_interspeech,
  author={Shogo Miwa and Atsuhiko Kai},
  title={{Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={4928--4932},
  doi={10.21437/Interspeech.2023-2463}
}