While early machines adopted isolated syllable as input units and needed boring enrollment, our research focus on the speaker-independent, word-based dictation. A deliberately designed 120-speaker database was built for training ; inter-syllable context ,tonal and endpoint dependent acoustic model are applied with promising MFCC feature; Two-pass acoustic matching accelerates the recognition making fully advantage of the monosyllabic structure of Chinese speech; A complete word bigram and trigram serve as language processing module. With all efforts, the system reaches 90% character accuracy performing in almost real-time on Pentium PC without DSP help.