This paper investigates how intermediate speech representations in a state-of-the-art automatic speech recognition (ASR) system encode multi-dimensional speech quality, including MOS, Noisiness, Coloration, Discontinuity, and Loudness. We found that speech quality information is encoded in the ASR encoder layers at various levels but is still much richer than the Mel-spectrogram, an input widely used in previous works. This discovery inspires us to develop the Attentive Conformer with ASR pretraining, a novel deep learning model that enables the utilization of rich information from pretrained ASR models and the ability to focus on specific layers. Experiments on the NISQA speech quality assessment dataset demonstrate that the proposed model achieves state-of-the-art performance with significantly less training data.