An efficient indexing scheme is essentially important for spoken term detection (STD) on large databases, particularly for phone-based systems that have been widely adopted to achieve vocabulary-independent detection. While the finite state transducer (FST) composition provides a standard indexing approach, the n-gram reverse indexing is more flexible in connectivity representation and confidence measuring and therefore may result in better performance than searching within the original lattices or the equivalent FSTs. In this paper we present an n-gram FST indexing approach which combines the flexibility of n-gram indexing and the efficiency of FST indexing. Specifically, we employ the n-gram indexing to relax the connectivity in original lattices and then formalize the indices into an FST for online search. We demonstrate this approach with a phone-based STD task where the lattice is sparse due to strong language models. The results show that the n-gram FST indexing provides not only better detection performance but a faster detection speed than both the conventional n-gram and FST indexing.
Index Terms: spoken term indexing, finite state transducer, spoken term detection, speech recognition