Several Time-Delay Neural Network(TDNN) architectures applied to speaker-dependent and multi-speaker's phoneme recognition are compared with respect to their capabilities on a speaker-independent phoneme recognition problem. Phoneme experiments for recognizing voiced stops /b, d, g/ using six and twelve training speakers showed high average recognition rates of 91. 3% and 93. 6%, respectively for eight test speakers. In addition, constructing networks by speakers' modules is effective in terms of saving training time, and leads to higher recognition performance than a single structure of TDNN with comparable network capacity. Furthermore, we propose an extended architecture for recognizing all phonemes based on the achievements in this paper.