An approach to modeling long-term consistencies in a speech signal within the framework of a hybrid Hidden Markov Model (HMM) / Multilayer Perception (MLP) speaker-independent continuous-speech recognition system is presented. Several ways to model male and female speech more accurately with separate models are discussed, one of which is investigated in depth. A method which combines gender-independent and -dependent MLP training is demonstrated, improving recognition accuracy while retaining robustness. A series of network architectures (using our training method) for the connectionist estimation of gender-dependent HMM observation probabilities are evaluated in terms of recognition performance and number of additional parameters needed. Experimental evalutation shows a significant improvement in word recognition accuracy over the gender-independent system with a moderate increase in the number of parameters.