ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Sequence classification for machine translation

Srinivas Bangalore, Patrick Haffner, Stephan Kanthak

Discriminatively trained classification techniques have been shown to out-perform generative techniques on many speech and natural language processing problems. However, most of the research in machine translation has been based on generative modeling techniques. The application of classification techniques to machine translation requires scaling classifiers to deal with very large label sets (the vocabulary of the target language). In this paper, we present a method to scale classifiers to very large label sets and apply it to train classifiers for machine translation. We contrast this approach to a generatively trained machine translation model represented as a weighted finite-state transducer. We show translation accuracy results on spoken language corpora in English to Spanish and English to Japanese translation tasks.