This study presents a robust indexing and retrieval scheme for digital photos with speech annotations based on the syllable-transformed patterns. In speech retrieval application, out-of-vocabulary and recognition error problems are generally prone to incorrect transcription and therefore degrade the retrieval performance. In this study, the recognized n-best syllable candidates for each syllable is regarded as an ordered pattern and converted into an “image-like” pattern using the multidimensional scaling (MDS) method for indexing and retrieval. Vector quantization is then applied to cluster image vectors into the indexing codeword. Finally, a VSM-based indexing mechanism is used for photo retrieval with speech query. Experiments were conducted on the speech annotations of 1,055 collected digital photos. Compared to other conventional methods, the syllable-transformed pattern method shows a promising improvement on speech-annotated photo retrieval. Keywords: Speech retrieval, photo retrieval, multidimensional scaling.