A challenge for automated spoken language assessment and feedback is the lack of high quality manually annotated L2 learner corpora, even for a common language like English. At the same time the popularity of end-to-end systems, which integrate speech recognition (ASR) with downstream tasks, has increased. This paper describes the annotation of a corpus that supports end-to-end system evaluation for Spoken Grammatical Error Correction (SGEC). There raises a number of challenges. This is further complicated as the annotation is preferably able to handle evaluation and development of individual modules, such as ASR, disfluency detection and GEC, combinations of these modules, as well as the final end-to-end system. A detailed description of the process used to annotate data from the Linguaskill Speaking test, a multi-level test for candidates from CEFR levels below A1 to C1 and above, is given. An example of how the corpus has been used to evaluate an advanced SGEC system is presented.