Prosody has been traditionally regarded as useless for word recognition. In this paper, we provide a schematic view describing how prosody can help word recognition. We provide our view in terms of a Bayesian network that models the stochastic dependence among acoustic observation, word, prosody, syntax and meaning, and an information-theoretic analysis proving that the mutual information between acoustic observation and correct word hypotheses improves if prosody is jointly modeled with word in a prosody dependent speech recognition framework. We also report our experiment on Radio News Corpus in which prosody has improved word recognition accuracy by 2.5%.