The ability to detect auditory attention from electroencephalography (EEG) offers many possibilities for brain-computer interface (BCI) applications, such as hearing assistive devices. However, effective feature representation for EEG signals remains a challenge due to the complex spatial and temporal dynamics of EEG signals. To overcome this challenge, we introduce a Spatiotemporal Graph Convolutional Network (ST-GCN), which combines a temporal attention mechanism and a graph convolutional module. The temporal attention mechanism captures the temporal dynamics of EEG segments, while the graph convolutional module learns the spatial pattern of multi-channel EEG signals. We evaluate the performance of our proposed ST-GCN on two publicly available datasets and demonstrate significant improvements over existing state-of-the-art models. These findings suggest that the ST-GCN model has the potential to advance auditory attention detection in real-life BCI applications.