The Dawn of Psychoacoustic Reverse Correlation: A Data-Driven Methodology for Determining Fine Grained Perceptual Cues of Speech Clarity
Paige Tuttösí, H. Henny Yeung, Yue Wang, Jean-Julien Aucouturier, Angelica Lim
The production of clear speech has been extensively explored, and several contributing cues have been identified. However, synthesizing clear speech by mimicking these cues has shown poor results. We suggest that, rather than trying to replicate clear speech from produced human speech, we should instead use a data-driven approach to understand what cues are driving perception. In past work, we used psychoacoustic reverse correlation to show that vowel duration has a particularly important influence on the perception of English vowels among French adult learners of English. Here, we systematically controlled synthesized speech to identify duration patterns that bias a listener to a specific vowel. We find that increasing the duration of tense vowels improves clarity, but increasing the duration of lax vowels reduces the identification accuracy of those vowels. Moreover, we find that this mechanism is much stronger for those with reduced listening abilities, i.e., French learners of English. We hope that in the future a similar methodology can be used to explore these mechanisms for the hard of hearing.