Currently, zero-shot cross-lingual spoken language understanding (SLU) attracts increasing attention. Most of existing methods construct a mixed-language context via the code-switching approach. However, due to the different syntactic structures of each language, code-switching might fail to perform well and result in the loss of semantics. To address this issue, we propose a novel framework termed FC-MTLF, which applies a multi-task learning by introducing an auxiliary multilingual neural machine translation (NMT) task to compensate for the shortcomings of code-switching. In addition, we also adopt the curriculum learning strategy to further improve the performance. Experimental results show that our framework achieves the new state-of-the-art performance on the MultiATIS++ dataset. Further analysis verifies that our FC-MTLF can effectively transfer knowledge from source languages to target languages.