ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Enhancing reusability of speech corpora by hyperlinked query output

Andreas Mengel, Ulrich Heid

In speech technology more and more databases of spoken language are becoming available. For research the availability of these data offers the possibility to study huge corpora. Apart from the fact that these corpora may be represented in different formats, it is sometimes difficult to relate annotations of one corpus to those of another corpus. This contribution argues for a representation of information in speech corpora that allows for the integrated representation of information on various levels of description in XML. Secondly, the study of huge amounts of speech data requires adequate retrieval mechanisms. A query architecture is described that allows for the retrieval of encoded entities by specifying their properties or various relations to other entities. The output of the query processor is represented in XML and thus can be used for further queries or a new level of description. The work presented here is part of the results of the MATE project (http://mate.mip.ou.dk).