Language teachers often claim that the goal of speech training should be intelligible enough pronunciations, not native-sounding ones, because some types of accented pronunciations are intelligible or comprehensible enough. However, if one aims to provide a technical framework of automatic assessment based on intelligibility or comprehensibility, s/he has to be faced with a big technical challenge. That is collection of L2 utterances with annotations based on these metrics. Further, learners always want to know which parts (words, morphemes, or syllables) in their speech should be corrected. This means that data collection needs a valid method of intelligibility annotation with fine granularity. In our previous studies, a new metric of shadowability was introduced, and it was shown experimentally to be highly correlated to perceived intelligibility or comprehensibility as well as it was explained theoretically to be potential to give annotations with fine granularity. In this paper, shadowability annotation with fine granularity is examined experimentally, and a new and more valid method of collecting shadowing utterances is introduced. Finally, we tentatively derive frame-based shadowability annotation for L2 utterances.