Dysphonia encompasses a broad spectrum of vocal disorders with diverse etiologies, among which adductor spasmodic dysphonia (ADSD) and primary muscle tension dysphonia (pMTD) are particularly challenging to diagnose. Currently, the primary diagnostic method relies on subjective auditory perception by highly experienced clinicians. To alleviate the scarcity of diagnostic resources, this study develops a deep learning-based approach for automatically diagnosing ADSD and pMTD using patients’ speech data. Our contributions are: (1) designing a convolutional neural network (CNN)-based diagnostic model that leverages handcrafted features derived from expert knowledge and (2) incorporating self-supervised learning (SSL) to extract more discriminative representations as input from raw waveforms adaptively. This marks the first application of deep learning techniques to ADSD and pMTD diagnostic modeling, achieving a classification accuracy of 83.3% on our newly constructed dataset.