Over the recent years, various deep learning-based embedding extraction methods were proposed for speaker verification. Although the deep embedding extraction methods showed impressive performance in various speaker verification tasks, their performance is limited when it comes to mismatched conditions due to the variability within them unrelated to the main task. In order to alleviate this problem, we propose a novel training strategy that regularizes the embedding network to have minimum information about the nuisance attributes. To achieve this, our proposed method aims to minimize the mutual information between the speaker embedding and the nuisance labels during the training process, where the mutual information is estimated using the statistics obtained via an auxiliary normalizing flow model. The proposed method is evaluated on cross-lingual and multi-genre speaker verification datasets, and the results show that the proposed strategy can effectively minimize the within-speaker variability on the embedding space.