We propose an unsupervised adaptation approach to improve target-speaker voice activity detection (TS-VAD) in speaker diarization (SD) based on quality-aware masking (QM) in order to reduce potential errors in the generated pseudo-labels. Furthermore, the QM-TS-VAD adapted model can be used as a teacher model to fine-tune a student SD model through knowledge distillation (KD) to further mitigate the over-fitting issue. Evaluated on the eight different domains in the DIHARD-III evaluation corpus, our experimental results show that the proposed QM-TS-VAD approach effectively enhances SD performances, and the introduced KD method can further reduce errors in seven of the eight domains. Finally, the proposed framework outperforms the unsupervised adaptation approach in the top-ranked system submitted to the DIHARD-III Challenge.