ISCA Archive CHiME 2024
ISCA Archive CHiME 2024

The CHiME-8 MMCSG Challenge: Multi-modal conversations in smart glasses

Kateřina Žmolíková, Simone Merello, Kaustubh Kalgaonkar, Ju Lin, Niko Moritz, Pingchuan Ma, Ming Sun, Honglie Chen, Antoine Saliou, Stavros Petridis, Christian Fuegen, Michael Mandel

The increasing adoption of smart glasses has opened up the way for innovative applications such as live speech captioning and translation. This presents new exciting research problems and opportunities. To increase the visibility of this research topic and support the researchers in this field, we are introducing the Multi-modal Conversations in Smart Glasses (MMCSG) dataset and challenge. The MMCSG dataset includes two-party conversations, recorded through smart Aria glasses worn by one of the participants, accompanied by manual annotations. Several modalities including multi-channel audio, video and Inertial measurement unit (IMU) measurements are available. Additionally, we are releasing the Multi-channel Audio Conversation Simulator (MCAS) dataset and tools. The simulator is designed to generate extensive simulated training data, simplifying development of robust systems. In the challenge, we will evaluate speaker-attributed speech recognition systems on both multi-talker word error rate and algorithmic latency. To assist the challenge participants, we are providing two baseline models. These models serve as starting points for development and as benchmarks for comparison. We hope that these resources will lower the barriers to entry for researchers interested in the potential of smart glasses in enhancing communication.