Speech emotion recognition (SER) is a field that has drawn a lot of attention due to its applications in diverse fields. A cur- rent trend in methods used for SER is to leverage embeddings from pre-trained models (PTMs) as input features to down- stream models. However, the use of embeddings from speaker recognition PTMs hasn't garnered much focus in comparison to other PTM embeddings. To fill this gap and in order to understand the efficacy of speaker recognition PTM embed- dings, we perform a comparative analysis of five PTM embed- dings. Among all, x-vector embeddings performed the best possibly due to its training for speaker recognition leading to capturing various components of speech such as tone, pitch, etc. Our modeling approach which utilizes x-vector embed- dings and mel-frequency cepstral coefficients (MFCC) as input features is the most lightweight approach while achieving com- parable accuracy to previous state-of-the-art (SOTA) methods in the CREMA-D benchmark.