ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

SASPEECH: A Hebrew Single Speaker Dataset for Text To Speech and Voice Conversion

Orian Sharoni, Roee Shenberg, Erica Cooper

We present SASPEECH, a 30-hour single speaker Hebrew corpus accompanied by a text-to-speech (TTS) benchmark. Our TTS benchmark was developed with other low resource languages in mind, allowing it to be adapted and potentially generalized. For the proposed method to work, one must have several hours of recordings and transcripts or have their language included in the Whisper model. SASPEECH is the first large-scale high-quality open dataset of its kind. Thus, it allows a discussion of challenges Hebrew presents when incorporated into generative models. For instance: bridging the gap between modern Hebrew lettering which lacks diacritics and correct pronunciation. We also tackle prominent issues shared by low resource languages and examine how to evaluate output quality without a benchmark. We believe our work will facilitate future generative Hebrew tools and low resource language research. The corpus is publicly accessible at https://www.openslr.org/134.