ISCA Archive Eurospeech 1991
ISCA Archive Eurospeech 1991

An automatic diphone segmentation system

Georg E. Ottesen

This paper discusses the requirements for an automatic diphone recording and segmentation system, and presents a PC-based system. The level and speech rate are controlled for each test word at recording time. A set of Norwegian test words is segmented by two different methods: 1) A speaker indepedant Hidden Markov Model (HMM), and 2) A Dynamic Time Warping (DTW) procedure adapted to one speaker. Norwegian diphones are then extracted. The best performance is obtained with the DTW procedure, giving a satisfactory segmentation for about 99 percent of the diphones. Keywords: - Automatic segmentation - Diphone synthesis - PSOLA synthesis - Dynamic time warping - Hidden Markov Model