Autism Spectrum Disorder (ASD) is a complex neurodevelopmental disorder, and mice models have become essential for studying its genetic and behavioural aspects. Ultrasonic Vocalisations (USVs) emitted by mice provide a promising biomarker for ASD detection, but existing methods relying on spectrogram-based features struggle to capture the complex, non-stationary, and multi-scale nature of USVs. To address this, we propose a novel multi-branch fusion model that integrates spectrogram-based features with multi-scale features extracted using Empirical Mode Decomposition (EMD), which decomposes USVs into Intrinsic Mode Functions (IMFs) to represent their inherent complexity better. Through systematic occlusion experiments, we identify high-frequency components, particularly IMF1, as critical for accurate ASD detection, highlighting the diagnostic relevance of high-frequency USV patterns. Our model achieves an Unweighted Average Recall (UAR) of 0.75 in subject-level classification, significantly outperforming existing methods. These findings provide valuable insights into the importance of multi-scale feature extraction and offer a robust framework for improving ASD diagnostics and research.