ISCA Archive PSP 2005
ISCA Archive PSP 2005

Perceptual adaptation to speaker characteristics: VOT boundaries in stop voicing categorization

Constance Clarke, Paul Luce

Recent research suggests that speech perception processes are flexible. For example, Norris et al. [Norris D, McQueen JM, & Cutler A (2003). Cognit. Psychol., 47, 204-238] demonstrated that listeners trained on stimuli containing ambiguous /s/-/f/ tokens subsequently showed an appropriate shift in their /s/-/f/ categorization boundaries based on the lexicality of the training stimuli. Moreover, recent research has shown that experience with a speaker's voice improves processing of that voice [Nygaard LC & Pisoni DB (1998). Percept. Psychophys., 60, 355- 376] and that native listeners appear to perceptually adapt to foreign-accented speech after only brief exposure [Clarke CM & Garrett MF (2004). J. Acoust. Soc. Amer., 116, 3647-3658]. These findings point to a perceptual system that is capable of learning about the variable features of speech. The present study extended this work by exploring changes in acoustic-phonetic criteria for stop category perception (voiced vs. voiceless) following brief exposure to a speaker. In a word-monitoring task, native English listeners heard sentences produced by a native English speaker in which all syllable-initial /t/ and /d/ segments were digitally modified to be atypical of native English pronunciation. Specifically, the /t/ voice onset times (VOTs) were reduced to a mean of 30 ms, and prevoicing was added to the /d/s. (Typical native English /t/s and /d/s are produced with long-lag and short-lag VOTs, respectively.) There were no other stop consonants in the sentences. Categorization of /t/ and /d/ was tested using a 5-token VOT continuum prior to exposure and again following 20, 40, and 60 sentences. As predicted, listeners' mean categorization boundary shifted to a lower VOT after exposure to the modified speech. Further, the boundary shift was evident after the first exposure block, containing less than two minutes of speech. No shift was found for a control group exposed to the unmodified sentences. These results suggest listeners' perceptual criteria for stop consonants can be adjusted to better match speakers' productions. They also indicate that this learning can occur when the key segment productions are embedded in full sentences, in addition to isolated words. Generalization of learning to the voicing distinction in other stops was also tested. These results will be discussed in terms of whether (a) perceptual learning of an abstract phonetic feature (i.e., voicing) is possible, or (b) each phonetic contrast must be learned on its own. [Work supported by NIDCD.]