ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Structural metadata annotation: moving beyond English

Stephanie Strassel, Jáchym Kolár, Zhiyi Song, Leila Barclay, Meghan Glenn

The goal of metadata extraction (MDE) is to enable technology that can take raw speech-to-text output and refine it into forms that are more useful to humans and to downstream automatic processes. Starting in 2003, a structural metadata annotation task was defined for English as part of the DARPA EARS Program. A significant new challenge for MDE is the addition of new languages. This paper reports on work undertaken to apply MDE annotation to data from three very different languages: Mandarin Chinese, Levantine Arabic, and conversational Czech. Details of annotation task modifications are provided for each language; along with a general overview of data and annotation tools for non-English MDE.