ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

An Automatic Soundtracking System for Text-to-Speech Audiobooks

Zikai Chen, Lin Wu, Junjie Pan, Xiang Yin

Background music (BGM) plays an essential role in audiobooks, which can enhance the immersive experience of audiences and help them better understand the story. However, well-designed BGM still requires human effort in the text-to-speech (TTS) audiobook production, which is quite time-consuming and costly. In this paper, we introduce an automatic soundtracking system for TTS-based audiobooks. The proposed system divides the soundtracking process into three tasks: plot partition, plot classification, and music selection. The experiments shows that both our plot partition module and plot classification module outperform baselines by a large margin. Furthermore, TTS-based audiobooks produced with our proposed automatic soundtracking system achieves comparable performance to that produced with the human soundtracking system. To our best of knowledge, this is the first work of automatic soundtracking system for audiobooks. Demos are available on https://acst1223.github.io/interspeech2022/main.