Speech emotion recognition for natural human-to-human conversations has many useful applications, including generating comprehensive meeting transcripts or detecting communication problems. We investigate the detection of emotional hotspots, i.e., regions of increased speaker involvement in technical meetings. As there is a scarcity of annotated, not-acted corpora, and to avoid introducing unwanted biases to our models, we follow a cross-corpus approach where models are trained on data from domains unrelated to the test data. In this work we propose a model ensemble trained on spontaneous phone conversations, political discussions and acted emotions. Evaluation is performed on the natural ICSI and AMI meeting corpora, where we used existing hotspot annotations for ICSI and created labels for the AMI corpus. A semi-supervised fine-tuning procedure is introduced to adapt the model. We show that an equal error rate of below 21% can be achieved using the proposed cross-corpus approach.