ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Reinforcement Learning from Answer Reranking Feedback for Retrieval-Augmented Answer Generation

Minh Nguyen, Toan Quoc Nguyen, Kishan KC, Zeyu Zhang, Thuy Vu

Retrieval-augmented generation (RAG) is a method to improve accuracy and reliability of large language models (LLMs) for open-domain question answering (ODQA). Traditional approaches rely on supervised learning, which can result in misaligned user intent and system output. Reinforcement learning from human feedback (RLHF) addresses this issue by training a reward model using human preference feedback. In this work, we introduce a novel RLHF framework for ODQA, leveraging existing large-scale answer reranking datasets for training a reward model. In particular, our reward model for ODQA plays two complementary roles: (i) providing ranking scores as rewards for PPO, and (ii) retrieving relevant facts that enable the ODQA system to formulate a factual answer. Experimental results indicate that our proposed framework is effective for RLHF, leading to near-expert performance for ODQA.