In this paper, a HMM-based visual speech system driven by audio speech input is designed to render a face model while synchronous audio is played. Compared to many methods adopted by other researchers, there is much difference between our approach and theirs. We first train the models for every final and initial in mandarin. In this process, a large quantity of audio training data under different surroundings and spoken by different people are used. Then, the recorded synchronous audiovisual speech data are used to make the trained models more adaptive to our specific announcer. Such models are more robust in synthesis phase and satisfying performance can be achieved even when input audio speech is degraded by noises.