We propose a novel model compression approach using multiple-teacher pruning based self-distillation for audio-visual wake word spotting, facilitating compact neural network implementations without sacrificing system performances. In each stage of the proposed framework, we prune a teacher model obtained in the previous stage to generate a student model, then fine-tune it with teacher-student learning and use it as a new teacher model for following stages. A normalized intra-class loss is designed to optimize this pruning based self-distillation (PSD) process. Both single-teacher PSD (ST-PSD) and multi-teacher PSD (MT-PSD) are adopted in the fine-tuning process each stage. When tested on audio-visual wake word spotting in MISP2021 Challenge, the two proposed techniques outperform state-of-the-art methods in both system performances and model efficiencies. Moreover, MT-PSD that leverages upon the complementarity of multiple teachers obtained in different stages also outperforms ST-PSD.