ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Continuous speech recognition using syllables

Rhys James Jones, Simon Downey, John S. Mason

The vast majority of work in continuous speech recognition uses phoneme-like units as the basic recognition component. The work presented here investigates the practicability of syllable-like units as the building blocks for recognition. A phonetically annotated telephony database is analysed at the syllable level, and a set of syllable-based HMMs are built. Refinements including the introduction of syllable-level bigram probabilities, word- and syllable- level insertion penalties, and the investigation of different model topologies are found to improve recogniser performance. It is found that the syllable-based recogniser gives recognition accuracies of over 60%, which compares with 35% as the baseline accuracy for monophone recognition. It is envisaged that practical applications of syllable recognition could be in a hybrid system, where the most common syllable HMMs would be used in conjunction with whole- word and phoneme models.