Ensuring accurate pronunciation is critical for high-quality text-to-speech (TTS). This typically requires a phoneme-based pronunciation dictionary, which is labour-intensive and costly tocreate. Previous work has suggested using graphemes insteadof phonemes, but the inevitable pronunciation errors that occurcannot be fixed, since there is no longer a pronunciation dictionary. As an alternative, speech-based self-supervised learning(SSL) models have been proposed for pronunciation control, butthese models are computationally expensive to train, producerepresentations that are not easily interpretable, and capture unwanted non-phonemic information. To address these limitations, we propose Spell4TTS, a novel method that generatesacoustically-informed word spellings. Spellings are both interpretable and easily edited. The method could be applied to anyexisting pre-built TTS system. Our experiments show that themethod creates word spellings that lead to fewer TTS pronunciation errors than the original spellings, or an Automatic SpeechRecognition baseline. Additionally, we observe that pronunciation can be further enhanced by ranking candidates in the spaceof SSL speech representations, and by incorporating Human-in-the-Loop screening over the top-ranked spellings devised byour method. By working with spellings of words (composed ofcharacters), the method lowers the entry barrier for TTS system development for languages with limited pronunciation resources. It should reduce the time and cost involved in creatingand maintaining pronunciation dictionaries.