An automatic speech recognition system is discussed in which the energy-time profiles at several frequency bands are used to represent an input utterance and then compared with a reference set obtained during training with many different speakers. To reduce considerably the number of misrecognitions as well as the overall matching time, a zero-crossing count front end is used for a voice/fricative initial classification. The recognition scheme is most suitable for monosyllabic languages and has the advantages of being very simple, avoiding time-warping and permitting low-cost implementation on a microcomputer. The system was evaluated for speaker-independent isolated word recognition of the ten Cantonese digits. A mean recognition accuracy of about 90-957o was obtained.