Observational studies are based on accurate assessment of human state.
A behavior recognition system that models interlocutors’ state
in real-time can significantly aid the mental health domain. However,
behavior recognition from speech remains a challenging task since it
is difficult to find generalizable and representative features because
of noisy and high-dimensional data, especially when data is limited
and annotated coarsely and subjectively. Deep Neural Networks (DNN)
have shown promise in a wide range of machine learning tasks, but for
Behavioral Signal Processing (BSP) tasks their application has been
constrained due to limited quantity of data.
We propose a Sparsely-Connected
and Disjointly-Trained DNN (SD-DNN) framework to deal with limited
data. First, we break the acoustic feature set into subsets and train
multiple distinct classifiers. Then, the hidden layers of these classifiers
become parts of a deeper network that integrates all feature streams.
The overall system allows for full connectivity while limiting the
number of parameters trained at any time and allows convergence possible
with even limited data. We present results on multiple behavior codes
in the couples’ therapy domain and demonstrate the benefits in
behavior classification accuracy. We also show the viability of this
system towards live behavior annotations.