ISCA Archive Eurospeech 2003
ISCA Archive Eurospeech 2003

The 300k LIMSI German broadcast news transcription system

Kevin McTait, Martine Adda-Decker

This paper describes improvements to the existing LIMSI German broadcast news transcription system, especially its extension from a 65k vocabulary to 300k words. Automatic speech recognition for German is more problematic than for a language such as English in that the inflectional morphology of German and its highly generative process of compounding lead to many more out of vocabulary words for a given vocabulary size. Experiments undertaken to tackle this problem and reduce the transcription error rate include bringing the language models up to date, improved pronunciation models, semi-automatically constructed pronunciation lexicons and increasing the size of the system's vocabulary.