ISCA Archive SLaTE 2023
ISCA Archive SLaTE 2023

Grammatical Error Correction for L2 Speech Using Publicly Available Data

Stefano Bannò, Michela Rais, Marco Matassoni

Over the past decades, the demand for learning English as a second language (L2) has grown consistently, as it has gradually become the lingua franca of business, culture, entertainment, and academia. This aspect has contributed to an increasing demand for systems for automatic feedback for applications in Computer-Assisted Language Learning. In this regard, mastering grammar is a key element of L2 speaking proficiency.

In this paper, we illustrate an approach to spoken grammatical error correction (GEC) in a cascaded fashion using only publicly available training data. Specifically, we start from learners' utterances, investigate disfluency detection, and finally explore GEC. We test this pipeline on NICT-JLE, a publicly available L2 corpus, and TLT-GEC, a private dataset that is under preparation for release. We obtain promising results which outperform previous studies that used large proprietary datasets, and we set a potential baseline for future experiments on spoken GEC.