ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Large-margin conditional random fields for single-microphone speech separation

Yu Ting Yeung, Tan Lee, Cheung-Chi Leung

Conditional random field (CRF) formulations for single-microphone speech separation are improved by large-margin parameter estimation. Speech sources are represented by acoustic state sequences from speaker-dependent acoustic models. The large-margin technique improves the classification accuracy of acoustic states by reducing generalization error in the training phase. Non-linear mappings inspired from the mixture-maximization (MIXMAX) model are applied to speech mixture observations. Compared with a factorial hidden Markov model baseline, the improved CRF formulations achieve better separation performance with significantly fewer training data. The separation performance is evaluated in terms of objective speech quality measures and speech recognition accuracy on the reconstructed sources. Compared with the CRF formulations without large-margin parameter estimation, the improved formulations achieve better performance without modifying the statistical inference procedures, especially when the sources are modeled with increased number of acoustic states.

doi: 10.21437/Interspeech.2014-259

Cite as: Yeung, Y.T., Lee, T., Leung, C.-C. (2014) Large-margin conditional random fields for single-microphone speech separation. Proc. Interspeech 2014, 983-987, doi: 10.21437/Interspeech.2014-259

  author={Yu Ting Yeung and Tan Lee and Cheung-Chi Leung},
  title={{Large-margin conditional random fields for single-microphone speech separation}},
  booktitle={Proc. Interspeech 2014},