This paper reports on recent advances in the field of instrumental quality evaluation of text-to-speech (TTS) synthesis. In particular, a wide range of acoustic quality markers are analyzed concerning their quality-describing power using the audiobook data from the Blizzard Challenge 2012. Several approaches for perceptual modeling are investigated and compared with each other. The results reveal substantial correlations as high as 0.87 between subjective ratings of overall impression and their estimates.