ISCA Archive IWSLT 2004
ISCA Archive IWSLT 2004

Phrase-based alignment combining corpus cooccurrences and linguistic knowledge

Adrià de Gispert, José B. Mariño, Josep M. Crego

This paper introduces a phrase alignment strategy that seeks phrase and word links in two stages using cooccurrence measures and linguistic information. On a first stage, the algorithm finds high-precision links involving a linguistically-derived set of phrases, leaving word alignment to be performed in a second phase. Experiments have been carried out for an English-Spanish parallel corpus, and we show how phrase cooccurrence measures convey a complementary information to word cooccurrences, and a stronger evidence of a good alignment. Alignment Error Rate (AER) results are presented, being competitive with and even outperforming state-of-the-art alignment algorithms.