ISCA Archive SpeechProsody 2024
ISCA Archive SpeechProsody 2024

A Comparison of Synthesis Method Impact on Listener Perception of Play-Acted Speech

Emily Lau, Brechtje Post, Kate Knill

There has been an increased interest in both Linguistics and Artificial Intelligence research in play-acted expressive speech and its acoustic and perceptual characteristics, which are nuanced and difficult to define. This work compares the results of two sets of listening experiments that test the impact of the Bio-informational Dimensions (BIDs) on perceptions of play-acted speech using stimuli that were re-synthesised using different methods. One method performs pitch manipulations on each pitch point simultaneously, while the second separates these manipulations into separate steps. In both tests, participants listened to pairs of utterances that were resynthesized along the BIDs of size projection and dynamicity to varying degrees to simulate dramatic expressions of anger, and then rated the utterances' differences in dramatic expression. Size projection was found to have significant positive impact in both experiments. However, dynamicity had a slightly significant negative effect in the first experiment but no significant effect in the second experiment. These results prompt further questioning about the specific parameters that impact perceptions of vocal expression and those that should be targeted when synthesizing specific speech styles.