ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Diachronic vocabulary adaptation for broadcast news transcription

Alexandre Allauzen, Jean-Luc Gauvain

This article investigates the use of Internet news sources to automatically adapt the vocabulary of a French and an English broadcast news transcription system. A specific method is developed to gather training, development and test corpora from selected websites, normalizing them for further use. A vectorial vocabulary adaptation algorithm is described which interpolates word frequencies estimated on adaptation corpora to directly maximize lexical coverage on a development corpus. To test the generality of this approach, experiments were carried out simultaneously in French and in English (UK) on a daily basis for the month May 2004. In both languages, the OOV rate is reduced by more than a half.

doi: 10.21437/Interspeech.2005-23

Cite as: Allauzen, A., Gauvain, J.-L. (2005) Diachronic vocabulary adaptation for broadcast news transcription. Proc. Interspeech 2005, 1305-1308, doi: 10.21437/Interspeech.2005-23

  author={Alexandre Allauzen and Jean-Luc Gauvain},
  title={{Diachronic vocabulary adaptation for broadcast news transcription}},
  booktitle={Proc. Interspeech 2005},