ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Non-Standard Accent TTS Support via Large Multi-Accent Frontend Pronunciation Knowledge Transfer

Noe Berger, Siqi Sun, Korin Richmond

Mainstream text-to-speech (TTS) applications rarely offer non-standard accented voice options, perhaps in part due to practical challenges associated with the acquisition of prerequisite training data. Building on prior research, this work demonstrates that a large multi-accent neural frontend model can reduce pronunciation training data requirements by 95% for robust performance in low-resource accents. We further show that pronunciation knowledge transfer is weakly influenced by accent similarity, quantified using a Levenshtein distance-based metric. The large multi-accent paradigm thus emerges as an effective strategy for improving non-standard accent voice-building accessibility, provided source accents are selected to maximize similarity with target accents where possible.