Text-to-Speech (TTS) systems now closely approximate human speech prosody. Yet, current deep learning-based TTS systems may struggle to accurately represent some prosodic patterns, like phrase boundaries used to signal syntactic distinctions. Such prosodic parsing can reflect differences in meaning, hence, inconsistencies in synthesis can lead to miscommunications. In this study, we conduct a qualitative assessment of five open-source TTS systems and reveal that they fail to produce acoustic signals that accurately convey distinct prosodic boundaries when given punctuation contrasts (Study 1). To mitigate this gap, we propose a pipeline for improving output using a customized dataset (Study 2), which successfully generates predictable acoustic cues, but only for certain cases. Results suggest that TTS systems require additional training to effectively capture the prosodic subtleties. We conclude by discussing how TTS systems can better generate fine prosodic distinctions.