This paper addresses improving the performance of CTC-based models, which leverage the intermediate outputs of all encoder layers with an attention mechanism. Several previous studies have used the intermediate outputs of the encoder layer to modify CTC-based models. Here, we focus on the role of the Transformer encoder layer, and each encoder layer is computed for two CTC losses by weighting the intermediate outputs of its lower and upper layers using an attention mechanism. By dividing the layer into two groups, it is expected to be possible to calculate the loss, taking into account both acoustic and linguistic features. Experimental results showed that the proposed method improved the baseline recognition performance of TEDLIUM2 speech data, achieving a WER of 9.9% on the dev set and 11.8% on the test set. Our method outperformed the conventional methods for WER with only slightly increased inference speed measured by RTF.