This paper examines F0 modeling and generation techniques for spontaneous speech synthesis. In the previous study, we proposed a prosodic-unit HMM where the synthesis unit is defined as a segment between two prosodic events represented by a ToBI label framework. To take the advantage of the prosodic-unit HMM, continuous F0 sequences must be modeled from discontinuous F0 data including unvoiced regions. The conventional F0 models such as the MSD-HMM and the continuous F0 HMM are not always appropriate for such demand. To overcome this problem, we propose an alternative F0 model named discontinuous observation HMM (DO-HMM) where the unvoiced frames are regarded as missing data. We objectively evaluate the performance of the DO-HMM by comparing it with the conventional F0 modeling techniques and discuss the results.
Index Terms: HMM-based speech synthesis, F0 modeling, prosody generation, discontinuous observation HMM, spontaneous speech.