ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Consonant recognition with continuous-state hidden Markov models and perceptually-motivated features

Philip Weber, Colin J. Champion, S. M. Houghton, Peter Jančovič, Martin Russell

Research into human perception of consonants has identified phoneme-specific perceptual cues. It has also been shown that the characteristics of the speech signal most useful for recognition depend on the specific speech sound. Typical ASR features and recognisers however neither vary with the type of sound nor relate directly to perceptual cues. We investigate classification and decoding of non-sonorant consonants using basic perceptually-motivated features — phoneme durations and energy in a few broad spectral bands. Our classification results using simple classifiers suggest that features optimal for human perception also perform best for machine classification. We show how characteristics of the models learned relate to knowledge of human speech perception. Recognition results using a continuous-state HMM (CSHMM) show accuracy similar to a discrete-state HMM with similar assumptions. We conclude by outlining how the CSHMM provides a mechanism to make use of other perceptually-important features by integration with similar models for recognition of voiced sounds.