Both perceptual and acoustic studies of children’s speech independently
suggest that phonological contrasts are continuously refined during
acquisition. This paper considers two traditional acoustic features
for the ‘s’-vs.-‘sh’ contrast (centroid and
peak frequencies) and a novel feature learned from data, evaluating
these features relative to perceptual ratings of children’s productions.
Productions of sibilant fricatives were elicited from 16 adults
and 69 preschool children. A second group of adults rated the children’s
productions on a visual analog scale (VAS). Each production was rated
by multiple listeners; mean VAS score for each production was used
as its perceptual goodness rating. For each production from the repetition
task, a psychoacoustic spectrum was estimated by passing it through
a filter bank that modeled the auditory periphery. From these spectra
centroid and peak frequencies were computed, two traditional features
for a sibilant fricative’s place of articulation. A novel acoustic
measure was derived by inputting the spectra to a graph-based dimensionality-reduction
algorithm.
Simple regression analyses indicated that a greater amount of
variance in the VAS scores was explained by the novel feature (adjusted
R2 = 0.569) than by either centroid (adjusted R2
= 0.468) or peak frequency (adjusted R2 = 0.254).