Accent recognition (AR) is challenging due to the lack of training data as well as the accents are entangled with speakers and regional characteristics. This paper aims to improve AR performance from two perspectives. First, to alleviate the data insufficiency problem, we employ the self-supervised learning representations (SSLRs) extracted from a pre-trained model to build the AR models. With the help of SSLRs, it gains significant performance improvement compared with the traditional acoustic features. Secondly, we proposed a persistent accent memory (PAM) as contextual knowledge to bias the AR model. The accent embeddings that are extracted from all training data by the encoder of AR models are clustered to form an accent codebook, i.e. PAM. In addition, we propose diverse attention mechanisms to investigate the optimal utilization of PAM. We observe that the best performance is obtained by selecting the most relevant accent embeddings.