ISCA Archive IWSLT 2004
ISCA Archive IWSLT 2004

Statistical machine translation of spontaneous speech with scarce resources

Evgeny Matusov, Maja Popović, Richard Zens, Hermann Ney

This paper deals with the task of statistical machine translation of spontaneous speech using a limited amount of training data. We propose a method for selecting relevant additional training data from other sources that may come from other domains. We present two ways to solve the data sparseness problem by including morphological information into the EM training of word alignments. We show that the use of part-of-speech information for harmonizing word order between source and target sentences yields significant improvements in the BLEU score.