This study examined if the size of the McGurk effect depends on the size of stimulus set presented in a block. The auditory syllables used in the present experiment were eight Japanese monosyllables, /pa/, /ta/, /ma/, / na/, /ba/, /da/, /ga/, and /ka/. Each auditory syllable was dubbed with either a compatible visible syllable or a discrepant visible syllable about place of articulation, resulting in 16 audio-visual stimuli. In the small set condition, two auditory consonants, /pa/ and /ta/ in one case and /ma/ and /na/ in another case, appeared in a block. In the medium set condition, four appeared /pa/, /ta/, /ma/, and /na/. In the large set condition, eight appeared /pa/, /ta/, /ma/, /na/, /ka/, / ba/, /da/, and /ga/. We examined if the size of the McGurk effect for /pa/, /ta/, /ma/, and /na/ varies depending on stimulus set-size. Participants identified consonant in three presentation conditions: audio-visual, audio-only, video-only. Except the video-only condition, auditory white noise was added (S/N=0dB). There was also a clear audio-visual condition in which no auditory noise was added. The results for bimodal discrepant pairs showed that auditory labials differ from auditory nonlabials with respect to the effect of the set size: although the auditory nonlabials (/ta/, / na/) did not show the effect of the set size, the size of the McGurk effect for auditory labials (/pa/, /ma/) depended on the stimulus set size, being larger when a consonant appeared in smaller sets. on the other hand, unimodal identifications were not affected by the set size. The observed effect of the set size on the McGurk effect was argued in terms of the number of dimensions in auditory and visual information.