ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

Automatic segmentation of speech for TTS

Andrej Ljolje, Michael D. Riley

Most computer text-to-speech systems are based on a large speech database of a single speaker which is fully segmented. Here we investigate an approach for automatic segmentation of speech using a probabilistic speech model which is normally used in speech recognition. The model is based on hidden Markov models (HMMs) and it can be used to provide phone boundary locations given speech and phone sequences. Results obtained on a phonetically rich set of 50 sentences, using a model trained on speech by the same speaker, are much better than the results obtained on a speaker independent task [1]. More than 80% of the automatically placed boundaries are within 11.5 ms of the boundary locations selected by a human segmenter.