Unsupervised domain adaptation (UDA) can tackle the mismatch between the source and target domains for real-world speaker verification applications. In this paper, we propose an UDA method by leveraging the target-domain data through a self-supervised method. Firstly, we use momentum contrastive learning to effectively utilize the latent speaker labels in the target domain, enhancing intra-speaker compactness and inter-speaker separability simultaneously. Secondly, we improve the inter-speaker feature distribution alignment loss, ensuring the stability of the source-domain statistics and mitigating the impact of false negative pairs. These two methods are further combined with conventional supervised learning in the source domain. Using Voxceleb2 as the source domain and CN-Celeb1 as the target domain, experimental results demonstrate the effectiveness of our proposed method.