This paper deals with acoustic-phonetic decoding for CSR. There are two different processing modules depending on the steady or transient nature of the speech input. First the steady state speech processing module, called phone-based anchor point detection, performs some preprocessing allowing the selection of only a subset of the vocabulary under consideration. Secondly a general processing module, based on diphone-like units, performs the actual decoding task. The work focuses on the combined use of preprocessing issued information and lexical knowledge to guide the main decoding stage. Speaker-dependent experiments were run on 50 CVCV sentences. A 275 word size lexicon is used to evaluate word recognition performances. No syntactical information being included, the perplexity of the language is about 140 due to the CVCV constraint imposed on word sequences. Results will be detailed for content and function words. The global word recognition rate is about 85 %.