ISCA Archive ISAPh 2021
ISCA Archive ISAPh 2021

Comparison of accent theories of Japanese using E2E speech synthesis in terms of their effectiveness for learners to acquire natural prosody

Nobuaki Minematsu, Fuki Yoshizawa, Tadashi Kumano, Kiyoshi Kurihara, Daisuke Saito

When teaching Japanese prosody to learners, two major theories of lexical accent are available, but they decompose an observed pitch pattern differently into lexical and phrasal components. Which theory is more valid pedagogically? In this work, a candidate answer is sought by comparing the two theories using text-to-speech conversion technologies. An end-to-end speech synthesizer is assumed to be a machine learner, which is built in two ways using a speech corpus annotated differently based on the two theories. Naturalness of the synthesized voices is compared between them, and it is shown that the two theories do not make any significant difference. With this finding in mind, from practitioners’ point of view, a pedagogical suggestion is made on which theory should be used in which teaching context. The two theories correspond directly to opposite approaches of teaching and/or learning prosody, i.e., analytic and holistic.