ISCA Archive ICSLP 1992
ISCA Archive ICSLP 1992

Multi-site data collection for a spoken language corpus - MAD COW

Lynette Hirschman

This paper describes the multi-site spoken language data collection procedure for the ATIS (Air Travel Information System) domain, which has been co-ordinated by MADCOW (Multi-site ATIS Data Collection Working group). We summarize the motivation for this effort, the implementation of the multi-site data collection paradigm, and the accomplishments of MADCOW in monitoring the collection and distribution of 14,000 utterances of spontaneous speech from five sites for use in a multi-site common evaluation of speech, natural language and spoken language.