Models such as XLS-R and UniSpeech have proven effective in speech processing across diverse languages, even with limited annotated data, enabling, for instance, the development of transcription systems for some under-documented languages. This work aims to test the hypothesis that these models can build “generic” representations of an audio snippet that do not depend on characteristics that are irrelevant to understanding the message conveyed. Through two sets of experiments, we assess their ability to abstract away from speaker-specific details and distill core informational contents — in an informational-communicational sense to be refined further: all the information contained in the audio signal that contributes evidence on the speaker's communicative intent. The results of our experiments show that pre-trained models of speech such as XLS-R do not necessarily encode information in the same way, depending on the speaker's gender.