Observational studies on couple interactions are often based on manual
annotations of a set of behavior codes. Such annotations are expensive,
time-consuming, and often suffer from low inter-annotator agreement.
In previous studies it has been shown that the lexical channels contain
sufficient information for capturing behavior and predicting the interaction
labels, and various automated processes using language models have
been proposed. However, current methods are restricted to a small context
window due to the difficulty of training language models with limited
data as well as the lack of frame-level labels. In this paper we investigate
the application of recurrent neural networks for capturing behavior
trajectories through larger context windows. We solve the issue of
data sparsity and improve robustness by introducing out-of-domain knowledge
through pretrained word representations. Finally, we show that our
system can accurately estimate true rating values of couples interactions
using a fusion of the frame-level behavior trajectories. The ratings
predicted by our proposed system achieve inter-annotator agreements
comparable to those of trained human annotators.
Importantly, our system
promises robust handling of out of domain data, exploitation of longer
context, on-line feedback with continuous labels and easy fusion with
other modalities.