Keyword spotting (KWS) is a very important technique for human–machine interaction to detect a trigger phrase and voice commands. In practice, a popular demand for KWS is to conveniently define the keywords by consumers or device vendors. In this paper, we propose a novel template matching approach for KWS based on end-to-end deep learning method, which utilizes an attention mechanism to match the input voice to the keyword templates in high-level feature space. The proposed approach only requires very limited voice samples (at least only one sample) to register a new keyword without any retraining. We conduct experiments on the publicly available Google speech commands dataset. The experimental results demonstrate that our method outperforms baseline methods while allowing for a flexible configuration.