How speech synthesis is evaluated is nowadays questioned. Not only have conventional listening tests as a whole been proven a poor match for modern synthesis, but more fundamentally, important information (e.g., the question asked to the listener) is frequently missing in the report of the outcome of the evaluation despite the impact on the interpretation of the test results. This can lead to uncertainty about the validity of these evaluations. To address this issue, we propose standardising the structure of any evaluation report. To facilitate this standardisation, our contribution is twofold: an open-source subjective evaluation platform; and a set of reporting guidelines. The platform is designed to enable the development of easily shareable evaluation recipes. The set of guidelines complements the platform to support researchers in reporting their evaluation choices and analysis in more detail while relying on the recipe to describe the actual evaluation process.