ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Extra large vocabulary continuous speech recognition algorithm based on information retrieval

Valeriy Pylypenko

This paper presents a new two-pass algorithm for Extra Large (more than 1M words) Vocabulary Continuous Speech recognition based on the Information Retrieval (ELVIRCOS). The principle of this approach is to decompose a recognition process into two passes where the first pass builds the word subset for the second pass recognition by using information retrieval procedure. Word graph composition for continuous speech is presented. With this approach a high performance for large vocabulary speech recognition can be obtained.