Machine learning models for speech-based depression classification
offer promise for health care applications. Despite growing work on
depression classification, little is understood about how the length
of speech-input impacts model performance. We analyze results for speaker-independent
depression classification using a corpus of over 1400 hours of speech
from a human-machine health screening application. We examine performance
as a function of response input length for two NLP systems that differ
in overall performance.
Results for both systems
show that performance depends on natural length, elapsed length, and
ordering of the response within a session. Systems share a minimum
length threshold, but differ in a response saturation threshold, with
the latter higher for the better system. At saturation it is better
to pose a new question to the speaker, than to continue the current
response. These and additional reported results suggest how applications
can be better designed to both elicit and process optimal input lengths
for depression classification.