ISCA Archive SLaTE 2023
ISCA Archive SLaTE 2023

Analyzing the Trade Space in Multi-lingual Automatic Text Difficulty Estimation

Esther Gupta, Douglas Jones

Assessing the difficulty of text material for foreign language learners remains a challenging task. Advances in the field of natural language processing shows promise for improving the performance of systems designed to evaluate text difficulty. In this work, we examine the impact of using multiple machine learning algorithms, transformer-based embeddings, and machine translation on 20 languages to determine their effectiveness on the text leveling task in an operational environment. We use accuracy and mean squared error (MSE) as our primary figures of merit. We also consider the computational consumption, ease of implementation, and speed of each approach. We find that while sentence embedding features offer some improvements for many languages, an ensemble method of more traditional algorithms gives larger performance gains with lower computational complexity.