This paper discusses three broad obstacles that must be overcome to improve prosodic quality in text-to-speech systems. First, direct and indirect limits set by the signal processing ("synthesis") components. Second, combinatorial and statistical constraints inherent in generalizing from training corpora to unrestricted domains, and that require the integration of contentspecific knowledge and detailed mathematical modeling. Third, the nature of many empirical research issues that must be solved for prosodic modeling to improve: they are often too focused and model-dependent for academe, and too long-term for development organizations.