ISCA Archive ICSLP 1990
ISCA Archive ICSLP 1990

A national database of spoken language: concept, design, and implementation

J. Bruce Millar, P. Dermody, M. Harrington, Julie Vonwiller

A model is proposed for the building of a national resource of spoken language data in the form of a cluster of compatible databases. Each component of the cluster will have its own linguistic characteristics dependent on the primary purpose behind its collection. However each component corpus will have the same structure and the same standards of data description. The emphasis is on adequate description of the data rather than on conformity to a standard of recording conditions, data storage, or linguistic content. This paper outlines the rationale for such a database and proposes principles for the structuring of data storage, and for the description of important dimensions of such spoken language data. Some attention is also given to the management of such a data base within the speech and language technology community.