ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

A hybrid approach to compounds in LVCSR

Tom Laureys, Vincent Vandeghinste, Jacques Duchateau

In several languages compound words form orthographic units, which complicates the task of ensuring good lexical coverage for large vocabulary continuous speech recognition (LVCSR). A common approach to the problem consists of first recognizing the compound constituents, followed by an automatic recompounding process. We describe an accurate compound module, which combines a rule-based approach with statistical pruning. The module is incorporated in a broadcast news recognition task for Dutch and yields an 11% relative decrease in word error rate (WER).