ISCA Archive issp 2024
ISCA Archive issp 2024

Perceptual evaluation of the naturalness of broadband articulatory speech synthesis using a 1D versus a 3D acoustic model

Rémi Blandin, Vincent Didone, Peter Birkholz, Angélique Remacle

Articulatory synthesis is a useful tool to explore the re- lationship between the speech production and perception pro- cesses. However, including the high frequencies (HF, above about 5 kHz) requires a three-dimensional (3D) acoustical model for realistic simulations. In this frequency range, one- dimensional (1D) acoustic models fail to predict additional res- onances and anti-resonances related to the 3D properties of the acoustic field. While articulatory synthesis based on 3D acoustic models is nowadays achievable for isolated phonemes, the impact of such models on the perception by human listeners remains largely unknown. The objective of this work was to determine whether a more realistic computation of transfer functions with a frequency domain approach results in phonemes perceived as more natural. For this purpose, a perception experiment using a 4-points Likert scale was conducted to evaluate the naturalness of seven static phonemes, /a, e, i, @, f, s, S/, synthesized with a 1D and a 3D models. No significant influence of the acoustic model was found, however, significant differences between the phonemes were perceived.