Conversion from graphemes to phonemes is an essential component in Text-To-Speech systems, and in Chinese, one main challenge is polyphone disambiguation-to determine the pronunciation of characters with multiple pronunciations. In this task, the benchmark dataset Chinese Polyphone disambiguation with Pinyin (CPP) suffers from two main limitations: Firstly, it contains some wrong labels in contrast to the newest official dictionary. Secondly, it is imbalanced and hence models learned from it show a learning bias towards frequently-used pronunciations and polyphones. In this paper, we refine CPP and release a new dataset named CVTE-poly, containing 845254 samples, nearly ten times the size of CPP and is more balanced. Besides, we propose a comprehensive measurement for polyphone disambiguation task, against the data imbalance problem. Experiments show that our simple but flexible baseline trained on CVTE-poly outperforms existing models, which demonstrate the benefit of our dataset.