ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus

Sho Matsumiya, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura

Most automatic speech recognition systems existing today are still limited to recognizing what is being said, without being concerned with how it is being said. On the other hand, research on emotion recognition from speech has recently gained considerable interest, but how those emotions could be expressed in text-based communication has not been widely investigated. Our long-term goal is to construct expressive speech-to-text systems that conveys all information from acoustic speech, including verbal message, emotional state, speaker condition, and background noise, into unified text-based communication. In this preliminary study, we start with developing a system that can convey emotional speech into text-based communication by way of text balloons. As there exist many possible ways to generate the text balloons, we propose to utilize linguistic and acoustic features based on comic books and anime films. Experimental results reveal that expressive text is more preferable than static text, and the system is able to estimate the shape of text balloons with 87.01% accuracy.