ISCA Archive SSPR 2003
ISCA Archive SSPR 2003

An assessment of automatic recognition techniques for spontaneous speech in comparison with human performance

Takahiro Shinozaki, Sadaoki Furui

To investigate problems of spontaneous speech recognition using N-grams and HMMs and estimate the room for improvement in the recognition rate, an automatic speech recognizer is evaluated in comparison with performances by human listeners. The evaluation task is to recognize spontaneous speech presentations from the Corpus of Spontaneous Japanese. Both the automatic recognizer and human listeners are requested to choose the most likely word from a dictionary, given a speech signal with a three word length including ± one word context extracted from a presentation. Recognition performances are compared using the same criteria for both experiments. The results show that recognition error rate by human listeners is roughly half of that by the recognizer. By examining words that are easy for humans but difficult for the recognizer, it is found that causes of the recognition errors by the decoder include insufficiency of model accuracy and lack of robustness against vague and variable pronunciations.