This paper presents a method for indexing spoken utterances which combines lexical and phonetic hypotheses in a hybrid index built from automata. The retrieval is realised by a lexical-phonetic and semi-imperfect matching whose aim is to improve the recall. A feature vector, containing edit distance scores and a confidence measure, weights each transition to help the filtering of the candidate utterance list for a more precise search. Experiment results show the complementarity of the lexical and phonetic representations, and compare the hybrid search with the state-of-the-art cascaded search to retrieve named entity queries.
Index Terms: information retrieval, speech indexing, lexical-phonetic automata, confidence measures, edit distances, supervised learning