Contrastive self-supervised learning has gained attention for its ability to create high-quality representations from large unlabelled data sets. A key reason that these powerful features enable data-efficient learning of downstream tasks is that they provide augmentation invariance, which is often a useful inductive bias. However, the amount and type of invariances preferred is not known apriori, and varies across different downstream tasks. We therefore propose a multi-task self-supervised framework (MT-SLVR) that learns both variant and invariant features in a parameter-efficient manner. Our multi-task representation provides a strong and flexible feature that benefits diverse downstream tasks. We evaluate our approach on few-shot classification tasks drawn from a variety of audio domains and demonstrate improved classification performance on all of them.
Cite as: Heggan, C., Hospedales, T., Budgett, S., Yaghoobi, M. (2023) MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations. Proc. INTERSPEECH 2023, 4399-4403, doi: 10.21437/Interspeech.2023-1064
@inproceedings{heggan23_interspeech, author={Calum Heggan and Tim Hospedales and Sam Budgett and Mehrdad Yaghoobi}, title={{MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={4399--4403}, doi={10.21437/Interspeech.2023-1064} }