Automatic pronunciation assessment is a critical component in computer assisted language learning. Typically, modeling pronunciation assessment tasks need labels, which are difficult to obtain as it requires expert annotators. Thus, it is essential to build an accurate model with less annotated data. In this work, an approach is proposed that considers a few speech samples using the i-vector framework. Each sample, first, is lengthened by T factor by concatenating the augmented samples of the same speech. The augmentation is obtained using time-scale modification (TSM), pitch-scale modification (PSM) and both. Next, phoneme-level goodness-of-pronunciation scores of concatenated speech are converted to a vector (GoP2Vec) with the i-vector framework. Experiments on two datasets revealed that the proposed GoP2Vec outperforms the state-of-the-art (SOTA) unsupervised methods and is on par with the SOTA supervised methods when it is used to train a simple neural model with a few samples.