Contextual biasing (CB) is an effective approach for contextualising hidden features of neural transducer ASR models to improve rare word recognition. CB relies on relatively large quantities of relevant human annotated natural speech during training, limiting its effectiveness in low-resource scenarios. In this work, we propose a novel approach that reduces the reliance on real speech by using synthesised audios for training CB adapters. We introduce a projection module (PM) that transforms encoder features of synthesised audios prior to CB training to better match real speech. We penalise PM with consistency regularisation to encourage higher similarity between features of real and synthesised speech. The proposed method maintains the same performance on both named-entity and general datasets while using half of the real speech data for CB training. Furthermore, we show a 16% word error rate reduction when the full real-speech training dataset is extended with synthetic utterances.