ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Identifying Vocal and Facial Biomarkers of Depression in Large-Scale Remote Recordings: A Multimodal Study Using Mixed-Effects Modeling

Nelson Hidalgo Julia, Robert Lewis, Craig Ferguson, Simon Goldberg, Wendy Lau, Caroline Swords, Gabriela Valdivia, Christine Wilson-Mendenhall, Raquel Tartar, Rosalind Picard, Richard Davidson

We examine vocal and facial data from a new study with n=954 depressed participants, each characterized by six time points of the eight-item Patient Health Questionnaire survey (PHQ-8). Patients interacted with a smartphone app over four weeks, with a 3-month follow-up. The app's animated character asked participants to describe, for 90 seconds, an emotional experience from the past 24 hours. We obtained 4,875 audio-video recordings, and applied linear mixed-effects models to examine associations between depression severity and 30 acoustic, linguistic and facial action unit features. Significant associations were found with speech timing and prosody, voice quality, linguistic sentiment, the use of self-referential pronouns, and facial action units related to smiling. We also show that these features allow accurate estimation of depression severity in multimodal mixed-effects machine learning models.