Modern speech synthesis models have achieved increasingly human-like outputs, and have particularly been shown to be practically indistinguishable from natural speech at the phone-and word-tiers. Still, many text-to-speech (TTS) models have been observed to contain errors at the prosodic level. The most commonly employed measures of synthesized speech quality, such as mean opinion scores, lack linguistically meaningful information about the prosodic plausibility of speech. In this paper, we explore methods for evaluating the effectiveness of prosodic encodings in language models by cross-analyzing state-of-the-art TTS models with corpus data of natural speech. Through automatic signal processing and exploratory statistical analysis, we examine an array of prosodic and acoustic features related to prominence and phrasing, including pitch, duration, and intensity. Our analysis suggests that the most significant among these prosodic indicators of TTS naturalness rely on correct assignments of major intonational events and phrasal pausing. Based on these results, we propose several quantitative measures to capture the prosodic accuracy of speech inputs. These results have important implications for furthering a theoretical understanding of perceptual importance in speech prosody, and also help reveal limitations of the prosodic knowledge in current deep learning speech technologies.