This paper compares three approaches to building phoneme-specific Gaussian mixture model (GMM) speaker recognition systems on the NIST 2003 Extended Data Evaluation to a baseline GMM system covering all of the phonemes. The individual performance of any given phoneme-specific GMM system falls below the performance of the baseline GMM, but fusing the top 40 performing scores of the individual phoneme systems at the 8 conversation train condition resulted in an equal error rate of 1.7%, which is a 2.6% absolute reduction in equal error rate from the baseline system. Further investigation showed complementary information across the three model building approaches as error rates dropped on a per phoneme basis when these systems were fused.