Increasingly more applications now use deep networks to analyse speaker's affective states. An undesirable side effect is that models trained to perform one task (e.g, emotion from speech) can be attacked to infer other, possibly privacy-sensitive attributes (e.g., gender) of the speaker. The amount of information an attacker can infer through such attacks is called leakage, and this article presents the first systematic study of the interplay between gender leakage and the main characteristics of the attacker model (family, architecture and training condition). To this end, we define various attack scenarios, and perform extensive experiments to analyse privacy risks in Speech Emotion Recognition (SER). Results show that SER models can leak a speaker's gender with an accuracy of 51% to 95% (upper bound) depending on the attack condition. Furthermore, our results provide fresh insights on how to limit the effectiveness of possible attacks and, thereby, to ensure privacy preservation.