Perceptual evaluation of non-controlled emotional speech requires delexicalization to neutralize semantic variation. However, most existing methods imply losing spectral cues crucial to emotional attribution, related to both laryngeal and supralaryngeal settings. We propose a method relying on voice morphing to retain part of the spectral information of the original stimuli, as an additional step to diphone synthesis delexicalization. After previous assessment of intelligibility loss, this study evaluates the naturalness of angry and neutral expressions in French films, delexicalized using low-pass filtering and the proposed method implemented with MBROLA and STRAIGHT. Results show that morphing-based delexicalization, which leads to accurate emotional attribution, is rated with a higher degree of naturalness than low-pass filtering. Implications for research in affective speech are discussed with regards to other delexicalization methods proposed in the literature.