This technical report describes the submission of team ToTaTo to the NOTSOFAR1 challenge. Our team only participated in the single-channel track of the challenge. Our best-performing system utilizes a Whisper model fine-tuned on the challenge dataset and voice-converted data. It performs CSS through the recently proposed PixIT framework which allows to skip speaker diarization altogether. It achieves a tcpWER score of 41.2% on the challenge evaluation set.