ISCA Archive PSP 2005
ISCA Archive PSP 2005

A model base upon response fields derived during early experience can account for the interference effects of synthetically degraded speech signals

Susan Denham, Martin Coath

In animals adult-like response properties in cortex develop through exposure to sounds during an early critical period [1]. The structure of spectrotemporal response fields (STRFs) in human auditory cortex is not known, but if they too develop through early acoustic experience then it seems likely that speech might play a large part in their formation. We investigated this hypothesis by developing a model of auditory processing in which STRFs were derived from fragments of a limited set of utterances [2]. We found that the responses of an ensemble of STRFs supported the classification of novel words and was robust to variability introduced by different speakers, sex and accents. Furthermore, the ensemble response could be interpreted in qualitatively different ways; eg. to classify the sex and identity of the speaker, or the prosody of the word [3]. The summed response of the ensemble of STRFs clearly indicates the presence of discrete events in an ongoing stream of sounds, and provides a way of quantifying the responsiveness of the model to arbitrary sounds. This suggests that the strength of the ensemble response to a sound could be used to predict its effectiveness as an interferer; the stronger the response the more interfering the sound. The entropy of the ensemble response was quantified for the range of degradations used in a recent study [4], and mirrored the interference effects of time-reversed speech, sine-wave speech and modulated sine-band speech. However, the basic linear model failed completely to account for the reduction of interference with increasing number of noise bands in modulated noise band speech. The introduction of an output non-linearity in the form of divisive inhibition [5] between the STRFs rectified this problem. We conclude that the model may provide a useful way to predict the degree to which any sounds will interfere with speech perception and could be used to investigate the influence of different acoustic environments on the formation of STRFs in early development and on subsequent perceptual abilities.

s Zhang, L.I., S. Bao, and M.M. Merzenich, Nat Neurosci, 2001. 4(11): p. 1123-30. Coath, M. and S.L. Denham, Biological Cybernetics, 2004. submitted. Coath, M., et al., Network: Computation in Neural Systems special issue on Sensory Coding And The Natural Environment, 2004. submitted. Brungart, D.S., et al., J Acoust Soc Am, 2005. 117(1): p. 292-304. Schwartz, O. and E.P. Simoncelli. in NIPS-00. 2000. Denver: MIT Press.