Many ASR engines are based on crowdsourced speech corpora, such as Common Voice. Although crowdsourced data is inexpensive, the utterances obtained from crowdsourcing can be noisy because of uncontrollable factors such as accents, environments, etc. Another issue with the Common Voice corpus is the lack of validators to cover a vast collection of crowdsourced utterances. This issue presents a significant challenge to speech data validation. To mitigate this bottleneck, we propose a machine-learning classifier that predicts the correctness of the data, which can act as either the validator itself or a prescreen for the validator. Our system achieves more than 95% F1-score in the three Common Voice languages, including Thai, Japanese, and Turkish, and performs even better when we have only one human judge involved in the decision. Furthermore, we also found that the data obtained from our method outperformed the current crowdsourcing validation method when used to train the ASR model.