We integrate a supervised machine learning mechanism for detecting erroneous words in the output of a speech recognizer with a two-tier error-correction approach that features (1) a noisy-channel model that replaces erroneous words with generic words, and (2) a phonetic-similarity mechanism that refines the generic words based on a short list of candidate interpretations. Our results, obtained on a corpus of 341 referring expressions, show that the first tier improves interpretation performance, and the second tier yields further improvements.