This paper addresses the Task 2 of Speech Emotion Recognition in Naturalistic Conditions Challenge at INTERSPEECH 2025, i.e., Emotional Attribute Prediction, and presents a simple and effective method named the Interactive Fusion of Multi-View Speech Embeddings (IF-MVSE). In this method, pretrained large-scale speech models are first utilized to extract multi-view speech embeddings, allowing for capturing complementary speech representations from multiple perspectives of speech signals. Subsequently, we design an interactive fusion strategy consisting of dual-feature interactive attention and multi-view self-balancing gated operations to integrate and enhance these speech embeddings from multiple views to predict the dimensional emotion attributes. Our IF-MVSE achieved the average CCC of 0.5955 on the official testing set, securing the Third Place in this track.