ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Building an English-iraqi Arabic machine translation system for spoken utterances with limited resources

Jason Riesa, Behrang Mohit, Kevin Knight, Daniel Marcu

This paper presents an English-Iraqi Arabic speech-to-speech statistical machine translation system using limited resources. In it, we explore the constraints involved, how we endeavored to mitigate such problems as a non-standard orthography and a highly inflected grammar, and discuss leveraging existing plentiful resources for Modern Standard Arabic to assist in this task. These combined techniques yield a reduction in unknown words at translation time by over 40% and a +3.65 increase in BLEU score over a previous state-of-the-art system using the same parallel training corpus of spoken utterances.