Since each of the currently available emotional speech corpora is rather small to deal with personal or cultural diversity, multiple emotional speech corpora can be jointly used to train a speech emotion recognition (SER) model robust to unseen corpora. Each corpus has different characteristics, including whether acted or spontaneous, in which environment it was recorded, and what lexical contents it contains. Depending on the characteristics, the emotion recognition accuracy and time required to train a model for it are different. If we train the SER model utilizing multiple corpora equally, the classification performance for each training corpus would be different. The performance for unseen corpora may be enhanced if the model is trained to show similar recognition accuracy for each training corpus that covers different characteristics. In this study, we propose to adopt corpus-wise weights in the classification loss, which are functions of the recognition accuracy for each of the training corpus. We also adopt pseudo-emotion labels for the unlabeled speech corpus to further enhance the performance. Experimental results showed that the proposed method outperformed previously proposed approaches in the out-of-corpus SER using three emotional corpora for training and one corpus for evaluation.