ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Investigating automatic decomposition for ASR in less represented languages

Thomas Pellegrini, Lori Lamel

This paper addresses the use of an automatic decomposition method to reduce lexical variety and thereby improve speech recognition of less well-represented languages. The Amharic language has been selected for these experiments since only a small quantity of resources are available compared to well-covered languages. Inspired by the Harris algorithm, the method automatically generates plausible affixes, that combined with decompounding can reduce the size of the lexicon and the OOV rate. Recognition experiments are carried out for four different configurations (full-word and decompounded) and using supervised training with a corpus containing only two hours of manually transcribed data.


doi: 10.21437/Interspeech.2006-89

Cite as: Pellegrini, T., Lamel, L. (2006) Investigating automatic decomposition for ASR in less represented languages. Proc. Interspeech 2006, paper 1776-Mon2A2O.4, doi: 10.21437/Interspeech.2006-89

@inproceedings{pellegrini06_interspeech,
  author={Thomas Pellegrini and Lori Lamel},
  title={{Investigating automatic decomposition for ASR in less represented languages}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1776-Mon2A2O.4},
  doi={10.21437/Interspeech.2006-89},
  issn={2958-1796}
}