ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

A Small and Fast BERT for Chinese Medical Punctuation Restoration

Tongtao Ling, Yutao Lai, Lei Chen, Shilei Huang, Yi Liu

In clinical dictation, utterances after automatic speech recognition (ASR) without explicit punctuation marks may lead to the misunderstanding of dictated reports. To provide a precise and understandable clinical report with ASR, automatic punctuation restoration (APR) is required. Considering a practical scenario, we propose a fast and lightweight pre-trained model for Chinese medical punctuation restoration based on the ‘pre-training and fine-tuning’ paradigm. In this work, we distill pre-trained models by incorporating supervised contrastive learning and a novel auxiliary pre-training task (Punctuation Mark Prediction) to make it well-suited for punctuation restoration. We then reformulate APR as a slot tagging problem in the fine-tuning stage to bridge the gap between pre-training and fine-tuning. Our experiments on various distilled models reveal that our model can achieve 95% performance with a 10% model size relative to the state-of-the-art Chinese RoBERTa.