We applied a physical model of the human vocal tract, which was originally designed for simulating English /r/, and tested whether the model can produce a certain range of vowels, especially mid front vowels. We first confirmed that the model can produce such vowels with high intelligibility. By changing the tongue height of the model, learners can adjust the vowel quality by listening to the output sounds as well as receiving a tactile sensation. Therefore, we further used the model for the pronunciation training as a hands-on tool for phonetic education based on the consideration that tongue and finger movements are related in terms of motor control. We demonstrated the vowel production using the model and received feedbacks from a group of listeners engaged in phonetic education. The synergetic effect of visual, auditory and tactile sensations was pointed out as an advantage. We then conducted a production experiment, where participants were asked to repeat each vowel they heard and produce that vowel by means of manipulating the vocal-tract model. As results, slight training effects were observed when using the physical model. Specifically, formant frequencies approached target frequencies as the experimental session progressed.