In this paper a method for generating word pronunciation networks for speech recognition is proposed. The networks incorporate different acceptable pronunciation variants for each word. These variants are determined by applying pronunciation rules to the standard pronunciation of the words. Instead of a manual search, an automatic learning procedure is used to compose a sensible set of rules. The learning algorithm compairs the standard pronunciation of each utterance in a training corpus with its auditory transcription (i.e. 'how should it be pronounced' versus 'how was it actually pronounced'). It is shown that the latter transcription can be constructed with the assistance of a speech recognizer. Experimental results on a Dutch database and on TIMIT demonstrate that the pronunciation networks reduce the word error rate significantly.