Entity Resolution (ER) in spoken dialog systems can suffer from phonetic variation in search queries caused by Automatic Speech Recognition (ASR) errors. In this paper, we propose a phonetic embedding technique to improve the robustness of the ER system to this variation, which includes a phonetic embedding model, a training-data augmentation and sampling method, and an ASR robustness evaluation methodology. We test the technique on two use cases: voice search for videos and for books in the e-commerce domain. Combined with a semantic embedding neural vector search (NVS) model, phonetic embedding reduces the error rate of retrieval by 7.07% relative for video, by 4.23% for books compared to NVS not using phonetic embedding, and by 49.9% for video, and by 35.3% for books compared to a lexical search baseline.