This paper presents a new, speaker-independent word recognition system based on three kinds of multilayer neural networks hierarchically arranged. The bottom neural networks act as identifiers of acoustic, events and align time distortion. The middle neural networks output similarity measures for the input words. The top neural network is a classifier and outputs the recognition candidates. Speaker-independent recognition experiments using 28 isolated Japanese words were carried out using data uttered by 150 speakers (100 speakers for training and 50 speakers for testing). As a result, we obtained a 97.1% recognition accuracy and a 1.0% error rate.