We discuss in this paper the possibility of modeling contextual variation and effects of speaking rate at a symbolic level. Contextual deformations of speech are described in a speech event-synchronized way rather than in the traditional time-synchronized way. Our objective is to compile dictionary phonetic transcriptions, together with context models, to produce a symbolic representation of speech in which context deformations and speech rate variations are taken into account by an explicit context-dependent model to enhance a continuous speech recognition system. We describe a model which takes explicitly into account the influence of contextual deformations as well as the rate of speaking in the continuous speech recognition. When tested on a 400 french words vocabulary, pronounced by three male and one female speakers, we found the use of our model leades to improvement in the recognition rate, especially for speaker independent mode.
Keywords: Continuous speech recognition, Contextual deformations, speaking rate