ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Spoken-to-written text conversion with Large Language Model

HyunJung Choi, Muyeol Choi, Yohan Lim, Minkyu Lee, Seonhui Kim, Seung Yun, Donghyun Kim, SangHun Kim

The improvement in end-to-end speech recognition systems has enhanced the readability of results, making it easier for users to understand texts and reducing translation errors. Korean uses both written and spoken forms, making it crucial to standardize pronunciation notation for high readability. Inverse Text Normalization (ITN) technology, which converts pronunciation into readable written form, can be applied in preprocessing training corpora or post-processing speech recognition outcomes. Recent Korean ITN research utilizes transformer models based on training data with both notations, facing performance degradation due to data scarcity. This paper proposes using Large Language Models for ITN to address this issue, overcoming the performance decline from limited data. The proposed method showed an 12.6% reduction in Error Reduction Rate (ERR).