A new acoustic modeling technique of hierarchical mixture densities (HMDs) is presented for handling phone substitutions in pronunciation variations. The lower level of an HMD is comprised of the continuous Gaussian mixture densities (CGMDs) of phone-unit HMMs; the higher level of the HMD is a linear combination of the CGMDs based on phone-substitution probabilities. The HMD technique has been evaluated in an open-vocabulary speaker-independent continuous speech recognition task, with a vocabulary size of 1793 and 52% new vocabulary words. The HMDs were found most effective for modeling substitutions between vowels. By combining the HMDs with several phonological rules of phone deletions, the recognition word accuracy was improved by 10.7% over that of the baseline CGMD HMMs (38.5% error reduction), and it outperformed the performance of using 33% more acoustic-phonetic transcriptions of words generated by phonological rules.