This paper describes how people incorporate visual lip-read information into speech perception, depending on one's native language/culture and experience of learning a second language. Studies on lipreading show that humans can easily make a distinction between labial consonants and nonlabial ones. Then we investigated how people integrate auditory and visual speech information, by using the "McGurk effect" paradigm in which labial-nonlabial conflict is introduced. Cross-language examinations were done across Japanese, American English, and Chinese. The Japanese and Chinese subjects were less susceptible to the McGurk effect than the American subjects, indicating a cultural/linguistic factor. The results for the Chinese subjects showed a correlation between the magnitude of the McGurk effect and the length of time they lived in a foreign country (Japan), suggesting a change due to second language learning.