ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Normalization of code-switched text for speech synthesis

Sreeram Manghat, Sreeja Manghat, Tanja Schultz

In multilingual communities, code-switching is a common phenomenon. Due to the increase in usage of social media, high level of code-switching is present in social media text as well. These code-switched social media texts are often seen written in monolingual script. Text normalization techniques of the conventional Text-to-Speech (TTS) and machine translation systems may not be able to handle such code-switched texts. Malayalam is a low resource Indic language. Conversational Malayalam contains high level of inter-sentential, intra-sentential as well as intra-word code-switching with English. This paper specifies the techniques for handling Malayalam-English code-switched text data. Evaluation results of experiments conducted on Malayalam-English code-switched data is also presented.