Stuttering detection is gaining attention to enable automatic monitoring of the condition of persons who stutter and to develop inclusive automatic speech recognition. Stuttering has various symptoms, i.e., repetitions of sounds, syllables, or words, prolongation of sounds, blocks, and interjections, and the acoustic properties are highly individual. Although detection performances have been improved by introducing deep learning techniques and creating larger datasets of stuttered speech, the datasets are still much smaller than those of normal speech, which leads to overfitting of the models. We propose a self-attention weight feature of a temporal acoustic vector sequence. This feature can efficiently extract the temporal structure specific to stuttered speech with less regard for the varied acoustic properties. Experimental results showed that a model utilizing multi-layer self-attention weight features of wav2vec 2.0 outperformed a previous attention-based model using wav2vec 2.0.