Voice conversion (VC) is to convert the source speaker’s voice to sound like that of the target speaker without changing the linguistic content. Recent work shows that phonetic posteriorgrams (PPGs) based VC frameworks have achieved promising results in speaker similarity and speech quality. However, in practice, we find that the trajectory of some generated waveforms is not smooth, thus causing some voice error problems and degrading the sound quality of the converted speech. In this paper, we propose to advance the existing PPGs based voice conversion methods to achieve better performance. Specifically, we propose a new auto-regressive model for any-to-one VC, called Auto-Regressive Voice Conversion (ARVC). Compared with conventional PPGs based VC, ARVC takes previous step acoustic features as the inputs to produce the next step outputs via the auto-regressive structure. Experimental results on the CMU-ARCTIC dataset show that our method can improve the speech quality and speaker similarity of the converted speech.