Large-scale self-supervised learning models have proven highly effective in extracting robust features for detecting both genuine and spoofed speech. However, leveraging these features remains challenging, particularly in balancing in-domain specialization with out-of-domain generalization. In this work, we propose an effective approach for automatically selecting features using self-gated mechanism, and aggregation of speech representations from a pretrained foundation model to enhance deepfake detection. Our approach integrates a multi-kernel gated convolution module to improve feature learning and facilitate the fusion of features. Additionally, we employ Mamba to effectively capture both short and long-range discriminative patterns in speech. The proposed method achieves strong performance in audio deepfake detection, demonstrating improved generalization across diverse datasets. The source code will be made available on Github.