Modeling pronunciation variation in spontaneous speech is very important for improving the recognition accuracy. One limitation of current recognition systems is their dictionaries for recognition only contain one standard pronunciation for each entry, so that the amount of variability that can be modeled is very limited. In this paper, we proposed to generate pronunciation networks based on rules to instead of traditional dictionary for decoder. The networks consider the special structure of Chinese and incorporate acceptable variants of each Chinese syllable . Also, an automatically learning algorithm is designed to get the variation rules. The proposed method was experimented on Hub4NE 1997 Mandarin Broadcast News Corpus and HLTC stack decoder. The syllable recognition error rate was reduced 3.20% absolutely with both intraand inter-syllable variations are both modeled.