Observing that most variations in pronunciation are strongly speaker and speaking style dependent, and that the introduction of pronunciation variants in a speaker-independent recognition system is of limited success, we refrain from applying multiple pronunciation variants in the speakerindependent case and instead introduce pronunciation variants without supervision when specializing the recognizer for a specific speaker. Our approach is to take the decoderÂ’s output after a first recognition pass and to realign it allowing several commonly observed pronunciation variations. In a second decoding pass, the pronunciation variations are integrated into the recognizer, weighted using Maximum Likelihood estimates for the pronunciation variantsÂ’ likelihoods on the realigned output of the first pass. We observe a small but significant improvement in recognition accuracy compared to the first pass output and conclude that the method is helpful in adjusting the pronunciation modeling structure according to speaker, speaking style and speaking rate. A better prior choice of possible pronunciation variations involving deeper phonetic knowledge would be beneficial for further improvements. We also show experimentally that the improvement gained through pronunciation adaptation does not overlap much with the improvement gained by unsupervised adaptation of the acoustic models, but rather that the achieved WER reductions are additive.