In this paper we propose a unified framework for decoding and feature representation based on the Maximum A Posterior (MAP) principle. The search space is augmented with an additional feature stream dimension such that different feature repre-sentations can be utilized for different phonetic context under the HMM decoding framework. We also provide a theoretic explanation for the unified framework. It gives us supervised" signal processing and feature extraction for the recognition system, which has reduced the word recognition error rate by 15% on a large-vocabulary continuous speech recognition task when multiple feature streams are used simultaneously.