ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Lexical-phonetic automata for spoken utterance indexing and retrieval

Julien Fayolle, Murat Saraçlar, Fabienne Moreau, Christian Raymond, Guillaume Gravier

This paper presents a method for indexing spoken utterances which combines lexical and phonetic hypotheses in a hybrid index built from automata. The retrieval is realised by a lexical-phonetic and semi-imperfect matching whose aim is to improve the recall. A feature vector, containing edit distance scores and a confidence measure, weights each transition to help the filtering of the candidate utterance list for a more precise search. Experiment results show the complementarity of the lexical and phonetic representations, and compare the hybrid search with the state-of-the-art cascaded search to retrieve named entity queries.

Index Terms: information retrieval, speech indexing, lexical-phonetic automata, confidence measures, edit distances, supervised learning