A major challenge with pipeline spoken language understanding systems is that errors in the upstream automatic speech recognition (ASR) engine adversely impact downstream natural language understanding (NLU) models. To address this challenge, we propose an ASR-Robust NLU (AR-NLU) framework that extends a pre-existing NLU model by training it simultaneously on two input streams: human generated or gold transcripts and noisy ASR transcripts. We apply contrastive learning to make the model learn the same representations and predictions for both gold and ASR inputs, thereby enhancing its robustness against ASR noises. To demonstrate the effectiveness of this framework, we present two AR-NLU models: a Robust Intent DEtection (RIDE) and ASR-Robust BI-encoder for NameD Entity Recognition (AR-BINDER). Experimental results show that our proposed AR-NLU framework is applicable to various NLU models and significantly outperforms the original models in both sequence and token classification tasks.