ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

CASA based speech separation for robust speech recognition

Runqiang Han, Pei Zhao, Qin Gao, Zhiping Zhang, Hao Wu, Xihong Wu

This paper introduces a speech separation system as a front-end processing step for automatic speech recognition (ASR). It employs computational auditory scene analysis (CASA) to separate the target speech from the interference speech. Specifically, the mixed speech is preprocessed based on auditory peripheral model. Then a pitch tracking is conducted and the dominant pitch is used as a main cue to find the target speech. Next, the time frequency (TF) units are merged into many segments. These segments are then combined into streams via CASA initial grouping. A regrouping strategy is employed to refine these streams via amplitude modulate (AM) cues, which are finally organized by the speaker recognition techniques into corresponding speakers. Finally, the output streams are reconstructed to compensate the missing data in the abovementioned processing steps by a cluster based feature reconstruction. Experimental results of ASR show that at low TMR (<-6dB) the proposed method offers significantly higher recognition accuracy.

doi: 10.21437/Interspeech.2006-20

Cite as: Han, R., Zhao, P., Gao, Q., Zhang, Z., Wu, H., Wu, X. (2006) CASA based speech separation for robust speech recognition. Proc. Interspeech 2006, paper 2068-Mon1WeS.2, doi: 10.21437/Interspeech.2006-20

  author={Runqiang Han and Pei Zhao and Qin Gao and Zhiping Zhang and Hao Wu and Xihong Wu},
  title={{CASA based speech separation for robust speech recognition}},
  booktitle={Proc. Interspeech 2006},
  pages={paper 2068-Mon1WeS.2},