To effectively improve a speech synthesis system, it is important to find and focus on improving the modules whose effect on naturalness degradation in synthesized speech are the largest. In this paper, we describe the design of a perception experiment to measure the effect of each module separately. Synthesized speech stimuli whose intermediate information is modified during a synthesis process are used in the experiment. A perception experiment in which a Japanese concatenative speech synthesis system was evaluated revealed that the text processing module and a part of the feature prediction module (for the fundamental frequency) of the system were the major factors in degrading naturalness.