ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Speechfind: an experimental on-line spoken document retrieval system for historical audio archives

Bowen Zhou, John H. L. Hansen

In this study, we present the SpeechFind system, an experimental online spoken document retrieval system for historical audio archives. As part of an on-going U.S. NSF Digital Library Initiative project, entitled the National Gallery of the Spoken Word (NGSW), SpeechFind is intended to serve as an audio index and search engine for spoken word collections spanning the 20th century with as much as 60,000 hours of audio archives. In this paper, we describe the system architecture of SpeechFind, with focus on audio data transcription and information retrieval components. Using a sample test audio data collection from the past 60 years, an evaluation of individual system components and overall performance is presented.