Substantial pitch variation due to tonal coarticulation occurs when lexical tones are produced in succession. When a coarticulated tone is excised out of context and played in isolation, coarticulatory pitch variations may inhibit tone recognition. It remains unclear how listeners utilize coarticulatory pitch cues for online speech recognition in context. Using the printed-word eye-tracking paradigm, we tested the recognition of the high tone in a low-high tonal sequence by native Standard Chinese (SC) listeners. In this sequence, the high tone exhibits coarticulatory rising f0. We manipulated the presentation of the preceding low tone: auditorily and visually present, auditorily absent (i.e., substituted by pink noise) but visually present or visually replaced by a high tone (to prompt inappropriate tonal coarticulatory cue). Analyses of the point of divergence and proportions of eye fixations revealed that listeners’ correct fixations at the high-tone target started early and increased quickly even though only the rising f0 part of the high tone was played auditorily, with a gradual delay following the compatibility between the visual and auditory stimulus presentations. The immediate utilization of tonal coarticulation for speech recognition by SC listeners suggests the need for fine-grained coarticulatory information in speech representation and processing.