ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

The tuning of speech detection in the context of a global evaluation of a voice response system

Laurent Mauuary, Lamia Karray

Field evaluations of automatic speech recognition (ASR) systems clearly demonstrate the importance of efficient rejection procedures for filtering out-of-vocabulary tokens. High performance speech recognition systems also require efficient speech detection. This paper presents an original framework for a global evaluation of speech recognition systems allowing to tune the speech detection module of an ASR system. A global evaluation allows to measure the performances of the speech recognition system from the user point of view and to identify the weak modules of an ASR system. Global evaluations are carried out on PSN (Public Switch Network) and GSM (Global System Mobile) databases. On the PSN database, global evaluation is used to choose the best value for the speech detector threshold. The results also show, that for this optimal value, the rejection of out-of-vocabulary words is currently the main problem to be solved for building high performance speech recognition systems for large public telecommunication applications. On GSM database, global evaluation is used to evaluate the benefits of speech enhancement before speech detection. Results show that the use of spectral subtraction as the speech enhancement technique before the detection drastically improves the speech detection, and consequently the global speech recognition.