In order to express natural prosodic variations in continuous speech, sophisticated speech units such as the context-dependent phone models are usually employed in HMM-based speech synthesis techniques. Since the training database cannot practically cover all possible context factors, decision tree-based HMM states clustering is commonly applied. One of the serious problems in a decision tree-based method is that the criterion used for node splitting and stopping is sensitive to irrelevant outlier data. In this paper, we propose a novel approach to removing outliers during the decision tree growing phase. Experimental results show that removing of outlying models improves the quality of the synthesized speech, especially for sentences which originally demonstrated poor quality.