ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Use of broadcast news materials for speech recognition benchmark tests

David S. Pallett, Jonathan G. Fiscus, William M. Fisher, John S. Garofolo

This paper reports on the use of materials derived from radio and television news broadcasts for research and testing purposes for large vocabulary Continuous Speech Recognition (CSR) technology. Tests using these materials have been implemented by NIST on behalf of the DARPA-funded speech recognition research community in 1995 and 1996, and are expected to continue for the next several years. Four research groups participated in the 1995 tests, and nine groups (at eight sites) participated in the 1996 tests. This paper documents properties of the training and test materials, describes a detailed annotation and transcription protocol that has been used for more than 100 hours of recorded data that has been made available through the Linguistic Data Consortium (LDC), and discusses test protocols and results of both the 1995 and 1996 Benchmark Tests.