ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Using reinforcement learning for dialogue management Policies: towards understanding MDP violations and convergence

Peter A. Heeman, Jordan Fryer, Rebecca Lunsford, Andrew Rueckert, Ethan Selfridge

Reinforcement learning is becoming a popular tool for building dialogue managers. This paper addresses two issues in using RL. First, we propose two methods for finding MDP violations. Both methods make use of computing Q scores when testing the policy. Second, we investigate how convergence happens. To do this, we use a dialogue task in which the only source of variability is the dialogue policy itself. This allows us to study how and when convergence happens as training progresses. The work in this paper should help dialogue designers build effective policies and understand how much training is necessary.