Generalized context-dependent sub word modeling is part of an effort to develop a robust speech recognition system for a variety of applications over the telephone network. In this paper we investigate two major issues: (1) linguistically motivated context-clustering to capture the similarity of contextual effects and reduce the number of context-dependent categories; (2) phone-specific Multi Layer Perceptron (MLP) structures where each phone is modeled by one or more network, and the number of outputs in each network is based on the number of left and right contexts occurring in a training database.