Somatosensory inputs are important to acquire or learn precise control of movement [1]. In the case of speech, receiving somatosensory inputs together with corresponding speech sounds may be a key to formulate or calibrate the speech production system [2]. We here examined whether speech production can be modulated by perceptual training with repetitive exposure to paired auditory-somatosensory stimulation in the absence of actual production of the sound. We carried out a perceptual training using a vowel identification task with /e/-/eu/ continuum. The speech sounds were accompanied with somatosensory stimulation, in which a facial skin-stretch was applied in the backward direction. The vowels /e/ and /eu/ were recorded prior to and following the training and the first three formants were compared. Results showed that the third formant of /e/ was increased following the training, and the rest of formant was not changed. Since the current somatosensory stimulation was related to the articulatory movement for the production of /e/ (lip-spreading), repetitive exposure to somatosensory stimulation in addition to the sound may specifically change the articulatory behavior for the production of /e/. The results suggest that perceptual training with specific pairs of auditory-somatosensory inputs can be important to formulate production mechanisms.