ISCA Archive SpeechProsody 2024
ISCA Archive SpeechProsody 2024

Unsupervised modeling of vowel harmony using WaveGAN

Sneha Ray Barman, Shakuntala Mahanta, Neeraj Kumar Sharma

Neural network models of phonological learnability are said to learn the phonotactics of a language better than traditional mod-els of learnability[1]. Our paper explores whether the Featural InfoWaveGAN architecture (fiwGAN [2]; inspired by Wave-GAN [3] and InfoGAN [4]) can capture regressive vowel har-mony patterns when trained unsupervised on raw acoustic data without any supply of prosodic cues. We train the model with Assamese speech data recorded by 15 native speakers. As-samese is one of the few Indian languages that exhibit phono-logically regressive and word-bound vowel harmony. [+high, +ATR] vowels [i, u] trigger right-to-left harmony of [-ATR] vowels [ε, ɔ, ʊ] resulting in [e], [o], and [u], respectively. We analyze the outputs generated by the fiwGAN model and ob-serve that it learns the regressive directionality of harmony. It produces innovative items by stringing together vowels and con-sonants from the training dataset. It showcases its capability of learning the phonotactics of Assamese and iterative harmony patterns over a longer domain without any relevant prosodic in-formation in the output. We assume the model treats the out-puts as abstract prosodic units without external prosodic cues triggering vowel harmony.