We present SQ-AST, a transformer-based model for non-intrusive speech quality prediction. The model predicts overall speech quality and four perceptual dimensions—noisiness, discontinuity, coloration, and loudness—using only the degraded signal. SQ-AST leverages Audio Spectrogram Transformers (AST), pretrained on large-scale audio datasets and fine-tuned on diverse speech quality corpora. It operates on short speech clips (4–12 seconds) without requiring a reference signal. Training was conducted on 106 databases comprising 165,791 samples. Independent evaluations confirm strong generalization to real-world conditions. The model is currently under consideration for ITU-T standardization, highlighting its potential for benchmarking, quality assessment, and industry adoption.