ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Complementary tasks for context-dependent deep neural network acoustic models

Peter Bell, Steve Renals

We have previously found that context-dependent DNN models for automatic speech recognition can be improved with the use of monophone targets as a secondary task for the network. This paper asks whether the improvements derive from the regularising effect of having a much small number of monophone outputs — compared to the typical number of tied states — or from the use of targets that are not tied to an arbitrary state-clustering. We investigate the use of factorised targets for left and right context, and targets motivated by articulatory properties of the phonemes. We present results on a large-vocabulary lecture recognition task. Although the regularising effect of monophones seems to be important, all schemes give substantial improvements over the baseline single task system, even though the cardinality of the outputs is relatively high.