Observational therapy is an important element of mental health that relies on a detailed assessment of multiple behavioral cues. Behavioral coding for research in the field is unfortunately often at session-level resolution due to the inherent cost of labeling and human subjectivity. Being able to model the interlocutors' behavior at a fine temporal resolution and analyze the effect of such behavioral changes in the gestalt perception can help psychologists better understand the behavioral mechanism. In this paper, we propose a method to model the dynamically evolving behavior of interlocutors during couple interactions. We firstly present a static behavioral model based on the local decisions with global fusion, and investigate the impact of the frame length to provide effective global evaluations. We then propose a two-layer sequential Hidden Markov Model to capture local state transitions. We use the corpus of Couple Therapy interactions as a case study, finding that an interlocutor does not express a single behavior throughout a conversation, and there are temporal correlations between neighboring frames. We show that dynamic models can achieve up to 10% relative improvement, compared to static models. This suggests that the human behavioral interaction is a non-linear process, and the resulting latent-state labels may provide new insights to domain experts.