Hybrid Dataset for Speech Emotion Recognition in Russian Language

Kondratenko, Vladimir; Karpov, Nikolay; Sokolov, Artem; Savushkin, Nikita; Kutuzov, Oleg; Minkin, Fyodor

doi:10.21437/Interspeech.2023-311

Hybrid Dataset for Speech Emotion Recognition in Russian Language

Vladimir Kondratenko, Nikolay Karpov, Artem Sokolov, Nikita Savushkin, Oleg Kutuzov, Fyodor Minkin

We present a new data set for speech emotion recognition (SER) tasks called Dusha. The corpus contains approximately 350 hours of data, more than 300 000 audio recordings of Russian speech, and their transcripts. Therefore it is the biggest open bi-modal data collection with an open license for SER tasks nowadays. This data set is the first speech emotion corpus in Russian, including both crowd-sourced acted and real-life emotions from podcasts, with multiple speakers and scalable data set size. Acted subset has a more balanced class distribution than the unbalanced real-life part consisting of audio podcasts. So the first one is suitable for model pre-training, and the second is elaborated for fine-tuning purposes, model approbation, and validation. This paper describes in detail our collecting procedure, pre-processing routine, annotation, and experiment with a baseline model to demonstrate some actual metrics which could be obtained with the Dusha data set.

doi: 10.21437/Interspeech.2023-311

Cite as: Kondratenko, V., Karpov, N., Sokolov, A., Savushkin, N., Kutuzov, O., Minkin, F. (2023) Hybrid Dataset for Speech Emotion Recognition in Russian Language. Proc. INTERSPEECH 2023, 4548-4552, doi: 10.21437/Interspeech.2023-311

@inproceedings{kondratenko23_interspeech,
  author={Vladimir Kondratenko and Nikolay Karpov and Artem Sokolov and Nikita Savushkin and Oleg Kutuzov and Fyodor Minkin},
  title={{Hybrid Dataset for Speech Emotion Recognition in Russian Language}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={4548--4552},
  doi={10.21437/Interspeech.2023-311},
  issn={2308-457X}
}