ISCA Archive SLaTE 2023
ISCA Archive SLaTE 2023

Annotation of L2 English Speech for Developing and Evaluating End-to-End Spoken Grammatical Error Correction

Katherine M Knill, Diane Nicholls, Mark Gales, Pawel Stroinski, Alex Watkinson

A challenge for automated spoken language assessment and feedback is the lack of high quality manually annotated L2 learner corpora, even for a common language like English. At the same time the popularity of end-to-end systems, which integrate speech recognition (ASR) with downstream tasks, has increased. This paper describes the annotation of a corpus that supports end-to-end system evaluation for Spoken Grammatical Error Correction (SGEC). There raises a number of challenges. This is further complicated as the annotation is preferably able to handle evaluation and development of individual modules, such as ASR, disfluency detection and GEC, combinations of these modules, as well as the final end-to-end system. A detailed description of the process used to annotate data from the Linguaskill Speaking test, a multi-level test for candidates from CEFR levels below A1 to C1 and above, is given. An example of how the corpus has been used to evaluate an advanced SGEC system is presented.