ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Large vocabulary automatic speech recognition for children

Hank Liao, Golan Pundak, Olivier Siohan, Melissa K. Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew Senior, Françoise Beaufays, Michiel Bacchiani

Recently, Google launched YouTube Kids, a mobile application for children, that uses a speech recognizer built specifically for recognizing children's speech. In this paper we present techniques we explored to build such a system. We describe the use of a neural network classifier to identify matched acoustic training data, filtering data for language modeling to reduce the chance of producing offensive results. We also compare long short-term memory (LSTM) recurrent networks to convolutional, LSTM, deep neural networks (CLDNN). We found that a CLDNN acoustic model outperforms an LSTM across a variety of different conditions, but does not specifically model child speech relatively better than adult. Overall, these findings allow us to build a successful, state-of-the-art large vocabulary speech recognizer for both children and adults.


doi: 10.21437/Interspeech.2015-373

Cite as: Liao, H., Pundak, G., Siohan, O., Carroll, M.K., Coccaro, N., Jiang, Q.-M., Sainath, T.N., Senior, A., Beaufays, F., Bacchiani, M. (2015) Large vocabulary automatic speech recognition for children. Proc. Interspeech 2015, 1611-1615, doi: 10.21437/Interspeech.2015-373

@inproceedings{liao15_interspeech,
  author={Hank Liao and Golan Pundak and Olivier Siohan and Melissa K. Carroll and Noah Coccaro and Qi-Ming Jiang and Tara N. Sainath and Andrew Senior and Françoise Beaufays and Michiel Bacchiani},
  title={{Large vocabulary automatic speech recognition for children}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={1611--1615},
  doi={10.21437/Interspeech.2015-373},
  issn={2958-1796}
}