ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Hermitian based hidden activation functions for adaptation of hybrid HMM/ANN models

Sabato Marco Siniscalchi, Jinyu Li, Chin-Hui Lee

This work is concerned with speaker adaptation techniques for artificial neural network (ANN) implemented as feed forward multi-layer perceptrons (MLPs) in the context of large vocabulary continuous speech recognition (LVCSR). Most successful speaker adaptation techniques for MLPs consist of augmenting the neural architecture with a linear transformation network connected to either the input or the output layer. The weights of this additional linear layer are learned during the adaptation phase while all of the other weights are kept frozen in order to avoid over-fitting. In doing so, the structure of the speaker-dependent (SD) and speaker-independent (SI) architecture differs and the number of adaptation parameters depends upon the dimension of either the input or output layers. We propose a more flexible neural architecture for speaker-adaptation to overcome the limits of current approaches. This flexibility is achieved by adopting hidden activation functions that can be learned directly from the adaptation data. This adaptive capability of the hidden activation function is achieved through the use of orthonormal Hermite polynomials. Experimental evidence gathered on the Nov92 task demonstrates the viability of the proposed technique.

Index Terms: Connectionist speech recognition systems, Neural networks, Adaptation algorithms, Speech recognition


doi: 10.21437/Interspeech.2012-13

Cite as: Siniscalchi, S.M., Li, J., Lee, C.-H. (2012) Hermitian based hidden activation functions for adaptation of hybrid HMM/ANN models. Proc. Interspeech 2012, 2590-2593, doi: 10.21437/Interspeech.2012-13

@inproceedings{siniscalchi12_interspeech,
  author={Sabato Marco Siniscalchi and Jinyu Li and Chin-Hui Lee},
  title={{Hermitian based hidden activation functions for adaptation of hybrid HMM/ANN models}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={2590--2593},
  doi={10.21437/Interspeech.2012-13},
  issn={2958-1796}
}