ISCA Archive ISCSLP 2006
ISCA Archive ISCSLP 2006

Full Utilization of Closed-captions in Broadcast News Recognition

Meng Meng, Shijin Wang, Jiaen Liang, Peng Ding, Bo Xu

Lightly supervised acoustic model training has been recognized as an effective way to improve acoustic model training for broadcast news recognition. In this paper, a new approach is introduced to both fully utilize the un-transcribed data by using closed captions as transcripts and to select more informative data for acoustic model training. We will show that this approach is superior to regular method, which filters data only based on matching degree of closed-captions and ASR results without considering the effectiveness of data. By the way, an approximately correct transcription for manual amendment is obtained by this approach, which can reduce manual effort enormously for detailed annotation. Keywords: Lightly supervised acoustic model training, closed-caption, ASR.