ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessian-free optimization

Brian Kingsbury, Tara N. Sainath, Hagen Soltau

Training neural network acoustic models with sequence-discriminative criteria, such as state-level minimum Bayes risk (sMBR), been shown to produce large improvements in performance over cross-entropy. However, because they entail the processing of lattices, sequence criteria are much more computationally intensive than cross-entropy. We describe a distributed neural network training algorithm, based on Hessian-free optimization, that scales to deep networks and large data sets. For the sMBR criterion, this training algorithm is faster than stochastic gradient descent by a factor of 5.5 and yields a 4.4% relative improvement in word error rate on a 50-hour broadcast news task. Distributed Hessianfree sMBR training yields relative reductions in word error rate of 7-13% over cross-entropy training with stochastic gradient descent on two larger tasks: Switchboard and DARPA RATS noisy Levantine Arabic. Our best Switchboard DBN achieves a word error rate of 16.4% on rt03-FSH.

Index Terms: deep learning, discriminative training, secondorder optimization, distributed computing


doi: 10.21437/Interspeech.2012-3

Cite as: Kingsbury, B., Sainath, T.N., Soltau, H. (2012) Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. Proc. Interspeech 2012, 10-13, doi: 10.21437/Interspeech.2012-3

@inproceedings{kingsbury12_interspeech,
  author={Brian Kingsbury and Tara N. Sainath and Hagen Soltau},
  title={{Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessian-free optimization}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={10--13},
  doi={10.21437/Interspeech.2012-3},
  issn={2958-1796}
}