ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications

Seyedmahdad Mirsamadi, John H. L. Hansen

Room reverberation is a primary cause of failure in distant speech recognition (DSR) systems. In this study, we present a multichannel spectrum enhancement method for reverberant speech recognition, which is an extension of a single-channel dereverberation algorithm based on convolutive nonnegative matrix factorization (NMF). The generalization to a multichannel scenario is shown to be a special case of convolutive nonnegative tensor factorization (NTF). The presented algorithm integrates information from across different channels in the magnitude short time Fourier transform (STFT) domain. By doing so, it eliminates any limitations on the array geometry or a need for information concerning the source location, making the algorithm particularly suitable for distributed microphone arrays. Experiments are performed on speech data using actual room impulse responses from AIR database. Relative WER improvements using a clean-trained ASR system vary from +7.1% to +30.1% based on the number of channels and the source to microphone distances (1 to 3 meters)


doi: 10.21437/Interspeech.2014-581

Cite as: Mirsamadi, S., Hansen, J.H.L. (2014) Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications. Proc. Interspeech 2014, 2828-2832, doi: 10.21437/Interspeech.2014-581

@inproceedings{mirsamadi14_interspeech,
  author={Seyedmahdad Mirsamadi and John H. L. Hansen},
  title={{Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2828--2832},
  doi={10.21437/Interspeech.2014-581},
  issn={2308-457X}
}