ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Asynchronous stochastic optimization for sequence training of deep neural networks: towards big data

Erik McDermott, Georg Heigold, Pedro J. Moreno, Andrew Senior, Michiel Bacchiani

Previous work presented a proof of concept for sequence training of deep neural networks (DNNs) using asynchronous stochastic optimization, mainly focusing on a small-scale task. The approach offers the potential to leverage both the efficiency of stochastic gradient descent and the scalability of parallel computation. This study presents results for four different voice search tasks to confirm the effectiveness and efficiency of the proposed framework across different conditions: amount of data (from 60 hours to 20,000 hours), type of speech (read speech vs. spontaneous speech), quality of data (supervised vs. unsupervised data), and language. Significant gains over baselines (DNNs trained at the frame level) are found to hold across these conditions. The experimental results are analyzed, and additional practical details for the approach are provided. Furthermore, different sequence training criteria are compared.


doi: 10.21437/Interspeech.2014-308

Cite as: McDermott, E., Heigold, G., Moreno, P.J., Senior, A., Bacchiani, M. (2014) Asynchronous stochastic optimization for sequence training of deep neural networks: towards big data. Proc. Interspeech 2014, 1224-1228, doi: 10.21437/Interspeech.2014-308

@inproceedings{mcdermott14_interspeech,
  author={Erik McDermott and Georg Heigold and Pedro J. Moreno and Andrew Senior and Michiel Bacchiani},
  title={{Asynchronous stochastic optimization for sequence training of deep neural networks: towards big data}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={1224--1228},
  doi={10.21437/Interspeech.2014-308},
  issn={2308-457X}
}