In this paper, we describe some considerations on a speaker-independent word recognition method on a large vocabulary size by the concatenation of syllable templates and a stochastic dynamic time warping method, where syllable templates are taken from spoken words. We got the reference patterns from 216 words uttered by 30 male speakers and recognized the other 200 words uttered by the other 10 speakers. The standard dynamic time warping method for speaker-independent recognition on 200 words gave the average word recognition rate of 89.3%. The stochastic dynamic time warping method we proposed here improved the recognition rate to 92.9%.