Defensive communication is known to have detrimental effects on the quality of social interactions. Hence, recognising and reducing defensive behaviour is crucial to improving professional and personal communication. We introduce DefComm-DB, a novel multimodal dataset comprising video recordings in which one of the following types of defensive communication is present: (i) verbally attacking the conversation partner, (ii) withdrawing from the communication, (iii) making oneself greater, and (iv) making oneself smaller. Subsequently, we present a machine learning approach for the automatic classification of DefComm-DB. In particular, we utilise wav2vec2, autoencoders, a pre-trained CNN and openSMILE for feature extraction from the audio modality. For the text stream, we apply ELECTRA and SBERT. On the unseen test set, our models achieve an Unweighted Average Recall of 49.4 % and 52.2 % for the audio and text modalities, respectively, showing the feasibility of the introduced challenge.