This paper investigates a new time-alignment system of a speech waveform with its phonetic transcription. This system is based on continuous Hidden Markov Models (HMM) associated with Mel frequency cepstral coefficients (MFCC). Two different approaches are developed. In the first approach, namely the centisecond approach, the alignment is performed in one pass. The second one, namely the segmental approach, proceeds in two phases to achieve the phonetic alignment: the speech signal is firstly segmented with a temporal method. Each segment is represented by a vector of MFCC and constitutes an observation of our global HMM. The best results are obtained with the segmental approach. It produces 25% of disagreement with manual labelling at an accuracy of ±20 ms with only ten context-independent classes of phones.
Keywords: Phonetic time-alignment, Hidden Markov Models, automatic segmentation.