Social signal detection is a task in speech technology which has recently became more popular. In the Interspeech 2013 ComParE Challenge one of the tasks was social signal detection, and since then, new results have been published on the dataset. These studies all used the Area Under Curve (AUC) metric to evaluate the performance; here we argue that this metric is not really suitable for social signals detection. Besides raising some serious theoretical objections, we will also demonstrate this unsuitability experimentally: we will show that applying a very simple smoothing function on the output of the frame-level scores of state-of-the-art classifiers can significantly improve the AUC scores, but perform poorly when employed in a Hidden Markov Model. As the latter is more like real-world applications, we suggest relying on utterance-level evaluation metrics in the future.