ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Acoustic feature-based non-scorable response detection for an automated speaking proficiency assessment

Je Hun Jeon, Su-Youn Yoon

This study provides a method that increases the robustness of automated speech scoring. Responses with sub-optimal characteristics such as background noises, volume problems, non-English speech, whispered speech, and non-responses make automated scoring more difficult. For instance, loud background noises distort the spectral characteristics of speech, and the performance of the prosody and pronunciation features are significantly degraded. Finally, the automated scores of these responses become less reliable. In order to address this problem, the automated scoring system in this study first filters out non-scorable responses using a filtering model and then predicts the proficiency scores of the remaining responses using a scoring model. In addition to automatic speech recognition-based (ASR) filter, which demonstrated promising performances in previous studies, a new filter was implemented in this study using acoustic features. The acoustic-based filter achieved a comparable performance to the ASR-based filter, and the combination of the two models achieved further improvement. The combined filter was evaluated on two actual test products and it achieved an accuracy rate of over 98% with an F-score of 86%.

Index Terms: automated speech scoring, speech recognition, acoustic features, filtering models, scorable responses