ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Multi-view Fusion and Parameter Perturbation for Few-Shot Class-Incremental Audio Classification

Yulu Fang, Mingyue He, Qisheng Xu, Jianqiao Zhao, Cheng Yang, Kele Xu, Yong Dou

Audio classification tasks typically assume a fixed number of classes, which is often unrealistic in real-world applications where the target class vocabulary is dynamic or unknown in advance. A significant challenge arises when models must adapt to new classes incrementally, as this process is prone to catastrophic forgetting—a sharp decline in performance on previously learned classes, especially in data-scarce scenarios. While dynamic network-based methods and prototype refinementbased methods have been proposed to address these challenges, they overlook two critical issues: (1) inadequate representation of raw audio samples, which limits generalization, and (2) the risk of overfitting, which limits adaptivity. In this paper, we propose Multi-View Fusion and Parameter Perturbation (MVF2P), a novel framework that leverages the complementary learning system to enhance generalizability and adaptivity within a unified incremental learning framework. MVF2P addresses the limitations of existing methods by integrating multi-view learning to enrich feature representation and a parameter perturbation mechanism to reduce overfitting. Extensive evaluations on two widely-used audio datasets, NS-100 and LS-100, demonstrate that MVF2P outperforms state-of-the-art methods in terms of average accuracy and performance drop rate. Notably, MVF2P not only mitigates catastrophic forgetting more effectively but also enhances the model’s adaptability to new classes, making it a robust solution for dynamic audio classification tasks.