ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones

Marco Matassoni, Ramón Fernandez Astudillo, Athanasios Katsamanis, Mirco Ravanelli

Distant speech recognition in real-world environments is still a challenging problem and a particularly interesting topic is the investigation of multi-channel processing in case of distributed microphones in home environments. This paper presents an initiative oriented to address the challenges of such a scenario; an experimental recognition framework comprising a multi-room, multi-channel corpus and the accompanying evaluation tools is made publicly available. The overall goal is to represent a common platform for comparing state-of-the-art algorithms, share ideas of different research communities and integrate several components in a realistic distant-talking recognition chain, e.g., voice activity detection, speech/feature enhancement, channel selection and fusion, model compensation. The recordings include spoken commands (derived from the well-known GRID corpus) mixed with other acoustic events occurring in different rooms of a real apartment. The work provides a detailed description of data, tasks and baseline results, discussing the potential and limits of the approach and highlighting the impact of single modules on recognition performance.


doi: 10.21437/Interspeech.2014-383

Cite as: Matassoni, M., Astudillo, R.F., Katsamanis, A., Ravanelli, M. (2014) The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones. Proc. Interspeech 2014, 1613-1617, doi: 10.21437/Interspeech.2014-383

@inproceedings{matassoni14_interspeech,
  author={Marco Matassoni and Ramón Fernandez Astudillo and Athanasios Katsamanis and Mirco Ravanelli},
  title={{The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={1613--1617},
  doi={10.21437/Interspeech.2014-383},
  issn={2308-457X}
}