EEG-based sound source tracking in cocktail party scenarios is a challenging task due to the complex acoustic environment. To address this, we propose a novel approach, Geometry-guided Temporal Attention (GTAnet), which integrates the spatial configuration of multi-channel EEG signals with the temporal dynamics of neural activity. GTAnet constructs a geometry-based graph to capture the spatial relationships between electrodes while employing a temporal attention mechanism to highlight key intervals in auditory processing. The results show that GTAnet outperforms baseline models and offers interpretable insights into the neural mechanisms underlying auditory scene analysis.