Conducting natural turn-taking behavior takes a crucial part in the
user experience of modern spoken dialogue systems. One way to build
such system is to learn those behaviors from real-world human-to-human
dialogues, which have the most diverse and fine-grained turn-taking
actions than any manual constructed sessions.
In this paper, we
propose a Dataset — FTAD which could be used to learn turn-taking
policies directly from human. First, we design an annotation mechanism
to transform existing human-to-human dialogue session into structural
data with most fine-grained turn-taking actions reserved. Then we explored
a set of supervised learning tasks on it, showing the challenge and
potential of learning complete fine-grained turn-taking policies based
on such data.