Unlike audio recording devices, a human speaker imitating a heard utterance or reading a sentence aloud must formulate a cognitive representation of the linguistic object to guide the phonology and phonetics of the spoken output. The current study used two different production tasks to explore the prosodic aspect of these representations: an imitation experiment in which speakers heard and then imitated spontaneous utterances from a Maptask corpus, and a read enactment task in which speakers read the same sentences aloud from a video display. For each task, the resulting utterances were compared for similarity a) to the original Maptask utterance and b) to each other. Similarity measures included perceptual accent and boundary labels and syllable durations, as well as Fujisaki model-based F0 parameters. The imitations showed strong agreement with the stimulus utterances both in their phonological structure (perceptually labeled accents and boundaries), and in several phonetic cues to prosody from measures of duration and F0. Furthermore, agreement between imitated utterances and the original spoken stimulus was higher than between different imitations. Finally, read and enacted utterances were substantially different from the original spoken stimulus, in terms of their phonology and F0 characteristics, though duration patterns were less variable. Overall, these results are consistent with the view that listeners extract the prosodic form of an utterance in terms of both phonological features and phonetic cues, and that the syntactic and semantic content of the text is not sufficient to determine a reliable prosodic outcome across subjects.
Index Terms: prosody, spontaneous speech, spoken imitation, phonetics and phonology, Fujisaki model