This paper presents a novel framework to automatically detect agreement and disagreement utterances in natural conversation. Such a function is critical for conversation understanding such as meeting summarization. One of the difficulties of agreement and disagreement utterance detection in natural conversation is ambiguity in the utterance unit. Utterances are usually segmented by short pauses. However, in conversations, multiple sentences are often uttered in one breath. Such utterances exhibit the characteristics of agreement and disagreement only in some parts, not the whole utterance. This makes conventional methods problematic since they assume each utterance is just one sentence and extract global features from the whole utterance. To deal with this problem, we propose a detection framework that utilizes only local prosodic/lexical features. The local features are extracted from short windows that cover just a few words. Posteriors of agreement, disagreement and others are estimated window-by-window and integrated to yield a final decision. Experiments on free discussion speech show that the proposed method, through its use of local features, offers significantly higher accuracy in detecting agreement and disagreement utterances.