ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Automatic transcription of Somali language

Abdillahi Nimaan, Pascal Nocéra, Jean-François Bonastre

Most African countries follow an oral tradition system to transmit their cultural, scientific and historic heritage through generations. This ancestral knowledge accumulated during centuries is today threatened of disappearing. Automatic transcription and indexing tools seem potential solution to preserve it. This paper presents the first steps of automatic speech recognition (ASR) of Djibouti languages in order to index the Djibouti cultural heritage. This work is dedicated to process Somali language, which represents half of the targeted Djiboutian audio archives. We describe the principal characteristics of audio (10 hours) and textual (3M words) training corpora collected and the first ASR results of this language. Using the specificities of the Somali language, (words are composed of a concatenation of sub-words called "roots" in this paper), we improve the obtained results. We also discuss future ways of research like roots indexing of audio archives.

doi: 10.21437/Interspeech.2006-90

Cite as: Nimaan, A., Nocéra, P., Bonastre, J.-F. (2006) Automatic transcription of Somali language. Proc. Interspeech 2006, paper 1817-Mon2A2O.5, doi: 10.21437/Interspeech.2006-90

  author={Abdillahi Nimaan and Pascal Nocéra and Jean-François Bonastre},
  title={{Automatic transcription of Somali language}},
  booktitle={Proc. Interspeech 2006},
  pages={paper 1817-Mon2A2O.5},