The availability of realistic simulated corpora is of key importance
for the future progress of distant speech recognition technology. The
reliability, flexibility and low computational cost of a data simulation
process may ultimately allow researchers to train, tune and test different
techniques in a variety of acoustic scenarios, avoiding the laborious
effort of directly recording real data from the targeted environment.
In the last decade, several simulated corpora have been released
to the research community, including the data-sets distributed in the
context of projects and international challenges, such as CHiME and
REVERB. These efforts were extremely useful to derive baselines and
common evaluation frameworks for comparison purposes. At the same time,
in many cases they highlighted the need of a better coherence between
real and simulated conditions.
In this paper, we
examine this issue and we describe our approach to the generation of
realistic corpora in a domestic context. Experimental validation, conducted
in a multi-microphone scenario, shows that a comparable performance
trend can be observed with both real and simulated data across different
recognition frameworks, acoustic models, as well as multi-microphone
processing techniques.