This paper presents the ViVoLab system designed for emotion classification in the context of the Odyssey 2024 Emotion Recognition Challenge (OERC2024). The motivation behind our participation is to explore the efficacy of novel sequence modeling architectures, with a specific focus on Selective State Space Models (SSM), known as MAMBA, for the challenging task of speech emotion recognition. We aim to investigate whether MAMBA can outperform traditional sequence modeling techniques such as the class-token transformer. Therefore, the classification model employed for this task integrates an SSM, in conjunction with a FeedForward layer and a Self-Attention mechanism. SSMs have demonstrated comparable performance to Transformer models in language and audio tasks, with notable advantages in terms of training and inference efficiency. Additionally, various data augmentation techniques, including additive and convolutional noise as well as SpecAugment, are implemented to mitigate overfitting. The proposed model achieves a F1 Macro score of 29.42% in the MSP-Podcast test dataset, a performance level akin to that of the baseline system established in the challenge.