Large language models have benefited from retrieval augmented generation (RAG) techniques, which allow relevant knowledge to be retrieved and provided as prompts to enhance natural language understanding capabilities. Extending this promising approach to spoken language understanding (SLU) tasks represents an important area of study. This paper introduces a novel RAG framework tailored for SLU called Retrieval Augmented Speech Understanding(RASU). The proposed model first employs the encoder from a pre-trained automatic speech recognition (ASR) model to retrieve relevant speech segments and transcripts from the training data given a new spoken utterance. The retrieved text transcripts and their corresponding intent labels are then formulated as prompts to conditionally guide the SLU decoder during generation. Additionally, a prompt attention mechanism is incorporated to strengthen the interaction between the generated outputs and the retrieved prompts. Empirical evaluations demonstrate that RASU substantially outperforms conventional end-to-end and cascaded SLU models on intent prediction from speech data. These results highlight the efficacy of leveraging retrieval-based prompting and external knowledge sources to markedly improve spoken language understanding performance. The RASU approach presents a promising direction for advancing SLU capabilities by bridging speech retrieval and generative language modeling.