Spoken language understanding (SLU) tasks such as goal estimation and intention identification from user's commands are essential components in spoken dialog systems. In recent years, neural network approaches have shown great success in various SLU tasks. However, one major difficulty of SLU is that the annotation of collected data can be expensive. Often this results in insufficient data being available for a task. The performance of a neural network trained in low resource conditions is usually inferior because of over-training. To improve the performance, this paper investigates the use of unsupervised training methods with large-scale corpora based on word embedding and latent topic models to pre-train the SLU networks. In order to capture long-term characteristics over the entire dialog, we propose a novel Recurrent Neural Network (RNN) architecture. The proposed RNN uses two sub-networks to model the different time scales represented by word and turn sequences. The combination of pre-training and RNN gives us a 18% relative error reduction compared to a baseline system.