ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

USD-AC: Unsupervised Speech Disentanglement for Accent Conversion

Jen-Hung Huang, Wei-Tsung Lee, Chung-Hsien Wu

This study proposes USD-AC, an innovative Unsupervised Speech Disentanglement Accent Conversion that does not require parallel data and text transcription for training, solving challenges such as limited labeled data and generalizability issues. USD-AC, grounded in speech decomposition, aims to separate accent features from linguistic content, enhancing its adaptability across various accent conversion tasks. It utilizes a pre-trained ASR model to extract linguistic content and incorporates accent embedding for accent representation. Adversarial training effectively disentangles accent information from other attributes, boosting conversion performance. USD-AC achieves remarkable outcomes for known speakers and accents and exhibits exceptional generalization to unseen speakers, accents, and content. Through experimental comparison, USDAC based on unsupervised learning has shown superiority and generalization ability compared to supervised learning methods.