In this paper we describe our experience with bottom- up and top- down state clustering techniques for the definition and training of robust acoustic-phonetic units. Using as a test-bed a speaker- independent telephone- speech isolated word recognition task with a vocabulary including 475 city names, we show that similar performances are obtained by tying the HMM states both with an agglomerative or a decision-tree clustering approach. Moreover, better results are obtained by a priori selecting the set of states that can be clustered, rather than relying solely on their acoustical similarity. In the bottom-up approach a stopping criterion for the furthest neighbor clustering procedure is proposed that does not require a threshold. In the top-down approach we show that a careful selected impurity function allows lookahead search to outperforms the classical decision tree growing algorithm.