Articulatory features (AF) are recently proposed as an alternative representation of the acoustic features (ACF) and combining an AF model and an ACF model has been shown to outperform the ACF model. In this paper, we investigated multiple ways to further improve the combination of an AF model and an ACF model. First, we propose a multiple-distribution AF model that increases models resolution by separately modeling different sub-phone segments. We then introduce the asynchrony combination of this multiple-distribution AF model with an ACF model to allow flexible combination of AF model "states" with different ACF model states. Second, we incorporate AF information into the ACF model training such that the ACF model is optimized to give the best performance when combining with the AF model for decoding. The combination of both techniques results in an absolute improvement of 2.5% in TIMIT phone recognition over the corresponding ACF model baseline.