We present experimental evidence from a study in which we monitor eye movements as people respond to pre-recorded instructions generated by a human speaker and by two text-to-speech synthesizers. We replicate findings demonstrating that people process human speech incrementally, making partial commitments as a word unfolds. Specifically, they entertain multiple lexical candidates on the fly depending on segmental overlap in the candidate set. Importantly, incremental understanding is also observed for synthesized text-to-speech instructions. These results, including some suggestive differences in responses with the two text-to-speech systems, establish the potential for using eye-tracking methodology together with synthesized speech stimuli as a powerful theoretical and experimental tool for spoken language processing research.