ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Automatically deriving categories for translation

Sergio Barrachina, Juan Miguel Vilar

An adequate approach to speech translation for small to medium sized tasks is the use of subsequential trans-ducers - a finite state model - as language model for a speech recognizer. These transducers can be automatically trained from sample corpora. The use of manually defined categories improves the training of the subsequential transducers when the available data are scarce. These categories depend on the source and target languages we want to translate. We introduce an automatic approach to derive categories that can be used in training subsequential transducers. This approach extends monolingual word clustering methods to the bilingual case using alignments obtained from statistical models. Experimental results indicate that the models trained with these categories have lower translation errors.