This paper describes a knowledge distillation approach to non-parallel many-to-many voice conversion (VC) using self-supervised speech representation techniques: perturbation-resistant variational autoencoder (PRVAE) and a hidden-unit BERT (HuBERT). PRVAE has achieved a breakthrough by significantly improving non-streaming and low-latency streaming VC performances. However, a notable gap persists between the real target and converted speech, posing a continuing challenge. To narrow this gap, we present PRVAE-VC2, an improved version of PRVAE-VC that leverages rich and contextually informed representations derived from pre-trained HuBERT. Furthermore, we apply knowledge distillation techniques to make PRVAE-VC2 a streamable method, ensuring that the advantage of PRVAE-VC as a streamable VC is not compromised. Evaluation results demonstrate that our approaches effectively reduce discrepancies between the converted speech and the target. Audio samples can be accessed at our webpage.