ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Leveraging LLMs for Written to Spoken Style Data Transformation to Enhance Spoken Dialog State Tracking

Haris Gulzar, Monikka Roslianna Busto, Akiko Masaki, Takeharu Eda, Ryo Masumura

Dialog State Tracking (DST) is an important part of Task-Oriented Dialog (TOD) systems, as it needs to navigate the complex human conversational flow to accomplish a task. Most TOD systems are trained on written-style text data, and their performance plunges when deployed in spoken scenarios due to natural disfluencies and human-speech recognition errors. Labeled spoken-style TOD data is limited because of the high data collection cost and privacy concerns. As Large Language Models (LLMs) emerge as a tool for synthetic text data generation, we explored their capability to generate spoken-style text-based TOD data. Through meticulously crafting LLM prompts, our generated labeled spoken style TOD data improved the absolute Joint Goal Accuracy (JGA) by 3.39% and relative JGA by 11.6%, for dedicated DST models. In this work, we showcase our divide-and-conquer-based data generation strategies and DST training to improve the performance of task-specific dialog models.