ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Woefzela - an open-source platform for ASR data collection in the developing world

Nic J. de Vries, Jaco Badenhorst, Marelie H. Davel, Etienne Barnard, Alta de Waal

Building transcribed speech corpora for under-resourced languages plays a pivotal role in developing speech technologies for such languages. We have developed an open-source tool for devices running the Android operating system to facilitate the efficient collection of speech data for Automatic Speech Recognition system development. The tool was designed for use in typical developingworld conditions; we present the relevant design choices and analyse the effectiveness of this tool by means of a case study. In particular, we introduce a novel semi-real-time quality monitoring system, which increases the efficiency of the data collection process.