ISCA Archive SpeechProsody 2004
ISCA Archive SpeechProsody 2004

Querying annotated speech corpora

Ulrike Gut, Jan-Torsten Milde, Holger Voormann, Ulrich Heid

This paper is concerned with querying annotated speech corpora. A growing number of such corpora is currently being created worldwide; however, their usefulness for a wider research community is restricted by the lack of standard tools for creating, editing, annotating, storing and querying them. Two solutions for these problems are presented here: the XML-based data format TASX for corpus creation and data format exchange and the NXT search tool for querying corpora. Both tools have been applied to the multi-level annotated LeaP corpus of non-native speech.