Multimedia databases contain an increasing amount of videos that are hardly semantically accessed. Among the useful indices that can be extracted from the sound track, the presence of a keyword at some place plays a prominent role. This paper deals with the specificities of such a keyword spotter and the enhancement brought to our previous technique, [1] based on frame labeling. To be useful, such a keyword spotter has to be speaker independent. Moreover it has to be able to detect any word out of an open vocabulary. This directly implies the use of a phonemic representation of the word. These constraints usually lead to an excessively time consuming tool. The division of the indexing process into two parts, the first one off-line, the second one at the query time, allows a faster response.