Inter-speaker variability is one of the major problems in speaker independent speech recognition. Performance achieved in speaker dependent experiments far surpasses results achieved in speaker independent recognition experiments when using similar training data and recognizer structures. In this work we attempt to apply Maximum Entropy clustering to the speaker clustering problem for large vocabulary speech recognition. This technique avoids the problems associated with training using insufficient data. It is achieved by generating speaker cluster dependent models using a weighted sum of the cluster dependent data and cluster independent data. The speech models in the experiments use Hidden Markov Models to model phones, whose states are weighted Gaussian mixtures. The clustering algorithm only adjusts the mixture weights and does not modify any other part of the model. This results in only a modest performance improvement. Phone accuracy when tested on the DARPA Resource Management task improves from 83.75% to 84.15%, but the word accuracy on the Feb89 test set remains virtually unchanged.