Recent advances in large language models (LLMs) have demonstrated strong reasoning abilities through chain-of-thought (CoT) prompting, yet their application in speech emotion recognition (SER) remains underexplored. Moreover, current SER models lack explainability based on emotion-related acoustic features. We propose AECoTD, a method that transfers reasoning abilities from a large LLM to a domain-specific SER LLM by leveraging fine-grained emotional acoustic features and text transcripts. It uses LoRA to distill the reasoning chain and an emotion-focused loss to preserve correct emotional attention, thereby enhancing the model’s explainability. Ablation experiments highlight the impact of fine-grained acoustic information, emotional CoT reasoning, and emotion-focused loss. Without using pre-trained representations, our method achieves state-of-the-art performance both in-domain and out-of-domain, demonstrating strong generalization ability.