Naturalness in Text-To-Speech (TTS) synthesizers is among the most widely evaluated aspect of TTS synthesizers. Despite the popularity, it has consistently been identified as a “nebulous” and “poorly defined concept, left to a listener’s subjective interpretation of the term. Without a proper definition, researchers either continue to promote under-informative evaluation designs, or argue in favour of rendering the term obsolete. As better methods of evaluation are being standardized, this paper presents a discussion around the definition of naturalness. Specifically, we describe naturalness as a multi-faceted perceptual attribute. While listener interpretation of the term naturalness has been covered in the previous literature, this paper serves to present a top-down approach. We enlist the perspectives on naturalness, as viewed by different practitioners of TTS or synthetic voices. First, we discuss why human-likeness is a desirable and important target in the development of speech synthesizers. We categorize the scope of naturalness within human-likeness along its use-cases. We next describe how a standalone understanding of human-likeness is not sufficient. We therefore provide an explanation of naturalness as appropriateness. The aim of this paper is to open a discussion around the meaning of naturalness, so that clear directions for its evaluations can be established.