This paper addresses the problem of automatically grouping unknown speech utterances that are from the same speaker. A clustering method based on maximum purity estimation is proposed, with the aim of maximizing the similarities of voice characteristics between utterances within all the clusters. This method employs a genetic algorithm to determine the cluster where each utterance should be located, which overcomes the limitation of conventional hierarchical clustering that the final result can only reach the local optimum. The proposed clustering method also incorporates a Bayesian information criterion to determine how many clusters should be created.