ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Automatic speech segmentation with multiple statistical models

Seung Seop Park, Jong Won Shin, Nam Soo Kim

In this paper, we propose a novel approach to improve the performance of automatic speech segmentation techniques for concatenative text-to-speech synthesis. A number of automatic segmentation machines (ASMs) are simultaneously applied and the final boundary time marks are drawn from the multiple segmentation results. To identify the best time mark among those provided by the multiple ASMs, we apply a candidate selector trained over a set of manually-segmented speech database. The candidate selector defines a mapping from the phonetic boundary to the best ASM index which will output the time mark that may be closest to the manual segmentation result. The experimental results show that our approach dramatically improves the segmentation accuracy.