Cross-domain speech emotion recognition (SER), which utilizes the source domain to recognize the emotions in the target domain, has received significant attention in recent years. In this paper, we propose a novel unsupervised transfer learning method named unsupervised transfer components learning (UTCL) for cross-domain SER. Specifically, we first learn a common projection for the cross-domain data, in which a PCA-like strategy is conducted for the source and target domains separately. Meanwhile, we design a simple strategy to ensure all cross-domain samples share similar manifold structures so that the learned common projection can preserve more transfer components. Furthermore, a novel adaptive structured graph strategy is designed to further narrow the gap between the cross-domain samples. Comprehensive experimental results on several benchmark datasets demonstrate that our method can achieve better performance in comparison with several state-of-the-art methods.