Recent advances in unsupervised speech representation learning discover new approaches and provide new state-of-the-art for diverse types of speech processing tasks. This paper extends the investigation of using wav2vec 2.0 deep speech representations for the speaker recognition task. It focuses on the robustness issues in different domains and considers the effectiveness of wav2vec not only on telephone and microphone speaker verification protocols but also for cross-channel task. It is concluded that powerful transformer-based speaker recognition systems can be well-generalized across variable conditions. It is concluded that powerful transformer-based speaker recognition systems can be well-generalized across variable conditions. In this study speaker recognition systems were analyzed on a wide range of well-known verification protocols. According to the results obtained in this paper we recommend to use data augmentation for fine-tuning of wav2vec based systems.