Boosted by self-supervised learning (SSL) on large amounts of unlabeled data, computationally demanding transformer-based audiovisual ASR (AV-ASR) achieves state-of-the-art performance. In this work, we are the first to propose teacher-student model distillation for an efficient and noise-robust AV encoder for AV-ASR. First, we compare two options for the teacher, a non-task-specific and a task-specific one. Second, we investigate the design and the components in the student neural network. Third, we explore loss function choices during distillation. By distillation with a simplified loss function, the final efficient conformer-based student has 69% fewer parameters and 23% less computational power than the teacher, but excels the baseline student with a WER of 4.6% (11.4%) in clean condition, and with 20.2% (35.7%) in 0dB babble noise. On average over noise types in 0dB SNR, our proposed student even achieves more than 50% relative WER reduction compared to the baseline student.