ISCA Archive Eurospeech 1995
ISCA Archive Eurospeech 1995

Learning language translation in limited domains using finite-state models: some extensions and improvements

J. M. Vilar, A. Marzal, Enrique Vidal

The Onward Subsequential Transducer Inference Algorithm (OSTIA) has been used for learning Language Translations in limited domain tasks. Although it is known to converge to the correct model when presented with enough training examples, the amount of training data can be prohibitive for large vocabularies. We address this problem by using appropriate clustering of words in both the input and output languages. Experimental results are presented which show that this approach effectively avoids dependency on the size of the vocabulary.