Accurately sensing a users interest in spoken dialog plays a signi?cant role in many applications, such as tutoring systems and customer service systems. In addition to the widely used acoustic evidence, we introduce different lexical features for interest level prediction and evaluate the impact of automatic speech recognition (ASR) on the effectiveness of lexical information. In order to capture contextual information, we combine the systems hypothesis for the previous turn with the current one. Our ?nal system uses a multi-level fusion method for this task. Each fusion step uses different information such as acoustic and lexical cues, contextual information, or hypotheses from different classi?ers. Our experiments show that various combinations improve system performance. In particular, we found that even though the word error rate is quite high, there is still performance gain by incorporating lexical information obtained from ASR output.