User simulators are a principal offline method for training and evaluating human-computer dialog systems. In this paper, we examine simple sequence-to-sequence neural network architectures for training end-to-end, natural language to natural language, user simulators, using only raw logs of previous interactions without any additional human labelling. We compare the neural network-based simulators with a language model (LM)-based approach for creating natural language user simulators. Using both an automatic evaluation using LM perplexity and a human evaluation, we demonstrate that the sequence-to-sequence approaches outperform the LM-based method. We show correlation between LM perplexity and the human evaluation on this task, and discuss the benefits of different neural network architecture variations.