We describe three perception studies in which subjects are offered film fragments (without any dialogue context) of speakers interacting with a spoken dialogue system. In half of these fragments, the speaker is or becomes aware of a communication problem. Subjects have to determine by forced choice which are the problematic fragments. In all three studies, subjects are capable of performing this task to some extent, but with varying levels of correct classifications. We conclude that combining auditory with visual information is beneficial for problem detection.