We present listen-and-confirm (LAC), a human-in-the-loop approach for improving extremely low-resource convolutional neural network-based (CNN) keyword spotting (KWS). LAC interactively presents short audio segments, detected by the KWS, to a human evaluator who confirms whether or not the keyword is present. These LAC responses are used to adjust the CNN training targets and then obtain an improved KWS. Experiments were conducted in English, for controlled experimentation, and Bambara, a severely under-resourced Malian language reflecting the true operational setting in which the KWS is currently used for humanitarian support. Relative improvements in mean precision of 38.18 and 21.62%, respectively, for English and Bambara were achieved for an audio query-by-example task after incorporating feedback from 50 LAC evaluations per keyword type. As a key finding, we show that LAC improves keyword spotting performance even when the human evaluator is completely unfamiliar with the target language. Therefore, LAC can be used to support rapid KWS development in a completely new language.