This paper reports on our recent advances in using Unit Selection to directly synthesize speech from facial surface electromyographic (EMG) signals generated by movement of the articulatory muscles during speech production. We achieve a robust Unit Selection mapping by using a more sophisticated unit codebook. This codebook is generated from a set of base units using a two stage unit clustering process. The units are first clustered based on the audio-, and afterwards on the EMG feature vectors they cover, and a new codebook is generated using these cluster assignments. We evaluate different cluster counts for both stages and revisit our evaluation of unit sizes in light of this clustering approach. Our final system achieves a significantly better Mel-Cepstral distortion score than the Unit Selection based EMG-to-Speech conversion system from our previous work while, due to the reduced codebook size, taking less time to perform the conversion.