This work describes a HMM-based keyword spotting system. In this system, keywords are modeled as concatenations of phoneme models, consequently, no specific databases are needed to train the system. In addition no filler models are required, therefore small computational requirements are necessary. Two main stages define the whole system. The first stage extracts segments from the utterance corresponding to possible keywords based on the maximization of a confidence measure. Those segments are used as input hypotheses for the second stage in order to get a new confidence measure. This second measure is determined by comparing the vector of emission probabilities for an hypothesis over the keyword model and the vector of emission probabilities for the best sequence of phonemes, in the segment where the hypothesis was detected. The first measure is linearly combined with the second one resulting in a new confidence measure which performs significatively better than that one.