Research on computational speech processing has traditionally relied
on the availability of a relatively large and complex infrastructure,
which encompasses data (text and audio), tools (feature extraction,
model training, scoring, possibly on-line and off-line, etc.), glue
code, and computing. Traditionally, it has been very hard to move experiments
from one site to another, and to replicate experiments. With the increasing
availability of shared platforms such as commercial cloud computing
platforms or publicly funded super-computing centers, there is a need
and an opportunity to abstract the experimental environment from the
hardware, and distribute complete setups as a virtual machine, a container,
or some other shareable resource, that can be deployed and worked with
anywhere.
In this paper, we discuss our experience with this concept and
present some tools that the community might find useful. We outline,
as a case study, how such tools can be applied to a naturalistic language
acquisition audio corpus.