ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Enhancing Transcripts of Open-Source Automatic Speech Recognition Models Through Fine-Tuning with Laughter and Speech-Laugh

Phuoc Hoang Ho, Dragoș Alexandru Bălan, Dirk K. J. Heylen, Khiet P. Truong

Non-lexical sounds such as laughter are considered important discourse markers in conversation as these sounds complement the lexical information in shaping interpersonal relations, managing the conversation, and in expressing attitudes and affect. However, general purpose open-source automatic speech recognition (ASR) systems are typically not focusing on these sounds. Laughter transcription is an understudied task in ASR: laughter is often not modelled as a token and is discarded in the evaluation of ASR. In our study, we investigate how current open-source ASR models are handling laughter and speech-laugh (speech interspersed with laughter). Using Switchboard and BuckEye as conversational speech corpora, we fine-tuned and evaluated Whisper and Wav2Vec2. Our results show that laughter can be integrated in ASR transcriptions without substantially degrading word error rate.