ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection

Pingyue Zhang, Mengyue Wu, Kai Yu

Contrastive self-supervised learning has seen great success in computer vision while been less investigated in the audio processing field, in particular depression detection, a socially critical challenge. Detecting depression from one's speech has been examined via various audio representations, including acoustic feature combinations and model-based ones. This paper proposes to obtain depressive audio representations by departing speech via reference features from an emotion recognition model. Furthermore, we propose a reference-enhanced contrastive learning (ReCLR) to select fine-grained positive instances and allocate weight to negative instances. The depression detection results indicate that contrastive learning is effective in such an audio task. Moreover, our modified ReCLR strategy has outperformed contrastive training without references.