This paper proposes a new word-recognition method based on the Structured Transition Networks (STN) with phonetic segments. Phonetic segments are multiple phonological units which consist of about 600 acoustic/phonetic structures of 32~96 msec duration. The STNs are state transition networks composed of a main path which represents a standard speech pattern and branches which represent distorted patterns. A flexible representation of speech fluctuation using these branches realizes a high rejection performance. The network design with the acoustic/phonetic knowledge requires a smaller amount of training data than do other statistical approaches. An evaluation of 16 spoken words uttered by 10 unknown speakers has achieved a recognition rate of 93.1%, and a rejection rate of 92.5% for the utterances outside the vocabulary.