ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Discriminative data selection for lightly supervised training of acoustic model using closed caption texts

Sheng Li, Yuya Akita, Tatsuya Kawahara

We present a novel data selection method for lightly supervised training of acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct one among them or reject both. It is demonstrated that the classifiers can effectively filter the usable data for acoustic model training without tuning any threshold parameters. A significant improvement in the ASR accuracy is achieved from the baseline system and also in comparison with the conventional method of lightly supervised training based on simple matching and confidence measure scores.