ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Assessment of the synthetic quality and controllability of laughing onset in speech-laugh synthesis

Ryo Setoguchi, Yoshiko Arimoto

This study is the first challenge of building a synthetic speech-laugh model via a deep learning technique. To maintain the phonetic intelligibility of synthesized speech-laugh, the model was trained with nonlaughing read speech material for both phones of speech-laugh (SL) and of speech (SP). To control laughing onset in SL, the model was also trained using SL material only for the phones of SL instances. The listening tests revealed that the naturalness score for synthesized female SL was as high as that for human SL and that the laughter-likeness score for synthesized SL was higher than that for synthesized SP in almost all conditions. The dictation test revealed that the training for phonetic intelligibility in SL synthesis was highly effective for synthesized SL. However, the difference between segmented SL onset and correct onset was greater for synthesized SL with phonetic intelligibility training than for that without training.