ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system

T. Kristjansson, J. Hershey, P. Olsen, S. Rennie, Ramesh Gopinath

We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate temporal dynamics. We report on two methods for introducing dynamics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the separation performance. Once the signals have been separated they are then recognized using speaker dependent labeling.


doi: 10.21437/Interspeech.2006-25

Cite as: Kristjansson, T., Hershey, J., Olsen, P., Rennie, S., Gopinath, R. (2006) Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system. Proc. Interspeech 2006, paper 1775-Mon1WeS.7, doi: 10.21437/Interspeech.2006-25

@inproceedings{kristjansson06_interspeech,
  author={T. Kristjansson and J. Hershey and P. Olsen and S. Rennie and Ramesh Gopinath},
  title={{Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1775-Mon1WeS.7},
  doi={10.21437/Interspeech.2006-25},
  issn={2958-1796}
}