This paper presents an improved subband neural network applied to joint speech denoising and dereverberation for online single-channel scenarios. Preserving the advantages of subband model (SubNet) that processes each frequency band independently and requires small amount of resources for good generalization, the proposed framework named STSubNet exploits sufficient spectro-temporal receptive fields (STRFs) from speech spectrum via a two-dimensional convolution network cooperating with a bi-directional long short-term memory network across frequency bands, to further improve the neural network discrimination between desired speech component and undesired interference including noise and reverberation. The importance of this STRF extractor is analyzed by evaluating the contribution of individual module to the STSubNet performance for simultaneously denoising and dereverberation. Experimental results show that STSubNet outperforms other subband variants and achieves competitive performance compared to state-of-the-art models on two publicly benchmark test sets.