ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Parallel training for deep stacking networks

Li Deng, Brian Hutchinson, Dong Yu

The Deep stacking network (DSN) is a special type of deep architecture developed to enable parallel learning of its weight parameters distributed over large CPU clusters. This capability of DSN in learning parallelism is unique among all deep models explored so far. As a prospective key component of next-generation speech recognizers, the architectural design of the DSN and its parallel learning enable DSNĀfs scalability over a potentially unlimited amount of training data and over CPU clusters. In this paper, we present our first parallel implementation of the DSN learning algorithm. Particularly, we show the tradeoff between the time/memory saving via a high degree of parallelism and the associated cost arising from inter-CPU communication. In addition, in phone classification experiments, we demonstrate a significantly lowered error rate achieved by DSN with full-batch training, which is enabled by parallel implementation in a CPU cluster, than with the corresponding mini-batch training exploited prior to the work reported in this paper.

Index Terms: parallel and distributed computing, deep stacking networks, full-batch training, phone classification

doi: 10.21437/Interspeech.2012-15

Cite as: Deng, L., Hutchinson, B., Yu, D. (2012) Parallel training for deep stacking networks. Proc. Interspeech 2012, 2598-2601, doi: 10.21437/Interspeech.2012-15

  author={Li Deng and Brian Hutchinson and Dong Yu},
  title={{Parallel training for deep stacking networks}},
  booktitle={Proc. Interspeech 2012},