ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Co-Speech Motion for Virtual Agents in Dialogue Using LLM-Driven Primitive Action Selection

Muhammad Yeza Baihaqi, Angel García Contreras, Seiya Kawano, Koichiro Yoshino

Non-verbal behaviors, such as co-speech motion, are essential for making artificial agents more lifelike and engaging. However, existing approaches to generating co-speech motions still face significant challenges. Rule-based systems, while capable of producing natural and engaging motions, struggle with generalizability. In contrast, the common data-driven methods can generate a wide variety of gestures but come with high costs and require substantial adaptation to be applied to agents. Leveraging the power of Large Language Models (LLMs) for contextual planning and understanding, we propose an LLM-based motion control model that uses a primitive action selection strategy. This approach is expected to provide a more flexible and scalable solution for generating contextually appropriate co-speech motions across various embodied systems, including virtual agents and robots.