ISCA Archive CHiME 2024
ISCA Archive CHiME 2024

ToTaTo System Descriptions for the NOTSOFAR1 Challenge

Joonas Kalda, Tanel Alumae, Séverin Baroudi, Martin Lebourdais, Hervé Bredin, Ricard Marxer

This technical report describes the submission of team ToTaTo to the NOTSOFAR1 challenge. Our team only participated in the single-channel track of the challenge. Our best-performing system utilizes a Whisper model fine-tuned on the challenge dataset and voice-converted data. It performs CSS through the recently proposed PixIT framework which allows to skip speaker diarization altogether. It achieves a tcpWER score of 41.2% on the challenge evaluation set.