ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

Definition of subword acoustic units for wordspotting

Richard C. Rose

This paper describes a study that was performed to evaluate several acoustic modeling techniques for HMM wordspotting. The wordspotting task involves unconstrained conversational speech utterances spoken over the public switched telephone network. Derived from the Switchboard speech corpus [4], the task is to detect a small vocabulary of keywords from running speech given approximately two hours of conversational speech utterances for training. The study was performed using a hidden Markov model (HMM) wordspotter based on a continuous speech recognition model. Several interesting results were obtained that have application to small vocabulary speech recognition problems where the input speech is relatively unconstrained and out-of-vocabulary utterances are poorly represented in training. These results concern the use of decision tree based allophone clustering for defining acoustic subword units, different representations for non-vocabulary words occurring in the input speech utterance, and the definition of simple language models for constraining the possible word transitions in the vicinity of keywords.