ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Automatic phonetic segmentation of Spanish emotional speech

A. Gallardo-Antolín, R. Barra, Marc Schröder, Sacha Krstulović, J. M. Montero

To achieve high quality synthetic emotional speech, unit-selection is the state-of-the-art technique. Nevertheless, a large expensive phonetically-segmented corpus is needed, and cost-effective automatic techniques should be studied. According to the HMM experiments in this paper: segmentation performance can depend heavily on the segmental or prosodic nature of the intended emotion (segmental emotions are more difficult to segment than prosodic ones), several emotions should be combined to obtain a larger training set (especially when prosodic emotions are involved; this is especially true for small training sets) and a combination of emphatic and non-emphatic emotional recordings (short sentences vs. long paragraphs) can degrade overall performance.