In previous work [1], we reported initial word spotting results using our phonetically based word spotter. In this paper, we first present a new version of our forward-backward based keyword score. We then concentrate on the choice of an HMM configuration, where a configuration defines how keyword and non-keyword components are combined. Possible components are a phoneme loop, a large or reduced vocabulary, and a language model. We examine various combinations of these components and discuss some issues in choosing an appropriate configuration for a given application. We then show how keyword spotting can be easily extended to "event" spotting by simply substituting a sub-grammar describing the event in place of a keyword. Experimental results are shown for various configurations and we explore how lexical coverage and the presence of a phoneme loop affect the overall performance. We show that the vocabulary can be significantly reduced with limited impact on performance. Finally, we present some initial results on an event spotting task.
Keywords: word spotting, hidden Markov model