ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Revealing Confounding Biases: A Novel Benchmarking Approach for Aggregate-Level Performance Metrics in Health Assessments

Stefano Goria, Roseline Polle, Salvatore Fara, Nicholas Cummins

Numerous speech-based health assessment studies report high accuracy rates for machine learning models which detect conditions such as depression and Alzheimer’s disease. There are growing concerns that these reported performances are often overestimated, especially in small-scale cross-sectional studies. Possible causes for this overestimation include overfitting, publication biases and a lack of standard procedures to report findings and testing methodology. Another key source of misrepresentation is the reliance on aggregate-level performance metrics. Speech is a highly variable signal that can be affected by factors including age, sex, and accent, which can easily bias models. We highlight this impact by presenting a simple benchmark model for assessing the extent to which aggregate metrics exaggerate the efficacy of a machine learning model in the presence of confounders. We then demonstrate the usefulness of this model on exemplar speech-health assessment datasets.