ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Speech-based annotation and retrieval of digital photographs

Timothy J. Hazen, Brennan Sherry, Mark Adler

In this paper we describe the development of a speech-based annotation and retrieval system for digital photographs. The system uses a client/server architecture which allows photographs to be captured and annotated on light-weight clients, such as mobile camera phones, and then processed, indexed and stored on networked servers. For speech-based retrieval we have developed a mixed grammar recognition approach which allows the speech recognition system to construct a single finite-state network combining context-free grammars, for recognizing and parsing query carrier phrases and metadata phrases, with an unconstrained statistical n-gram model for recognizing free-form search terms. Experiments demonstrating successful retrieval of photographs using purely speech-based annotation and retrieval are presented.