ISCA Archive SIGUL 2023
ISCA Archive SIGUL 2023

Neural Speech Synthesis for Austrian Dialects with Standard German Grapheme-to-Phoneme Conversion and Dialect Embeddings

Lorenz Gutscher, Michael Pucher, Víctor Garcia

For languages where extensive audio data and text transcriptions are available, text-to-speech (TTS) systems have showcased the ability to generate speech that closely resembles natural human speech. However, the development of TTS systems for dialects and language varieties poses challenges such as limited data availability and strong regional variations. This paper presents a TTS system tailored for under-resourced language varieties spoken in Austrian regions. The system is built upon the FastSpeech 2 architecture and includes modifications to incorporate dialect embeddings for training and inference. It is demonstrated that employing dialect embeddings and a standard German grapheme-to-phoneme conversion is effective in modeling language varieties and provides means to shift a person’s spoken variety from one to another. This allows for the generation of regional standards for dialect speakers or the generation of dialect speech with the voice of a standard speaker. The findings unveil new possibilities and applications in other multilingual contexts where shared characteristics within the language or dialect embedding space can be leveraged.


doi: 10.21437/SIGUL.2023-15

Cite as: Gutscher, L., Pucher, M., Garcia, V. (2023) Neural Speech Synthesis for Austrian Dialects with Standard German Grapheme-to-Phoneme Conversion and Dialect Embeddings . Proc. 2nd Annual Meeting of the ELRA/ISCA SIG on Under-resourced Languages (SIGUL 2023), 68-72, doi: 10.21437/SIGUL.2023-15

@inproceedings{gutscher23_sigul,
  author={Lorenz Gutscher and Michael Pucher and Víctor Garcia},
  title={{Neural Speech Synthesis for Austrian Dialects with Standard German Grapheme-to-Phoneme Conversion and Dialect Embeddings }},
  year=2023,
  booktitle={Proc. 2nd Annual Meeting of the ELRA/ISCA SIG on Under-resourced Languages (SIGUL 2023)},
  pages={68--72},
  doi={10.21437/SIGUL.2023-15}
}