Auditory Attention Decoding (AAD) identifies a listener's focus in complex auditory scenes based on cortical neural responses. High decoding performance using DNN-based methods has been achieved with public EEG datasets. However, performance may be overestimated as models might learn temporal-autocorrelation features rather than auditory attention-related features. While data splitting risks have been discussed, experimental design risks have not. In this work, we collected a non-block design (NBD) scalp-EEG and ear-EEG joint dataset and compared it to previous block design (BD) datasets using DNN-based models. Results show a significant accuracy drop from BD to NBD dataset, while a linear stimulus reconstruction model remains robust. Inter-trial phase coherence analysis confirms stronger neural phase-locking to attended speech in BD dataset. These findings suggest BD enhances coherence of neural response but risks overestimating AAD accuracy. Code and data are released.