ISCA Archive Interspeech 2010
ISCA Archive Interspeech 2010

SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment

Ji Hun Park, Seon Man Kim, Jae Sam Yoon, Hong Kook Kim, Sung Joo Lee, Yunkeun Lee

In this paper, we propose a computational auditory scene analysis (CASA)–based front–end for two–microphone speech recognition in a car environment. One of the important issues associated with CASA is the accurate estimation of mask information for target speech separation within multiple microphone noisy speech. For such a task, the time–frequency mask information is compensated through the signal–to–noise ratio resulted from a beamformer to adjust the noise quantity included in noisy speech. We evaluate the performance of an automatic speech recognition system employing a CASA–based front–end with the proposed mask compensation method. Then, we compare its performance with those employing a CASA–based front–end without mask compensation and the beamforming–based front–end. As a result, the CASA–based front–end with the proposed method achieves relative WER reductions of 26.52% and 8.57%, compared that the beamformer and a CASA–based front–end alone, respectively.