ISCA Archive Interspeech 2016
ISCA Archive Interspeech 2016

Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks

Wonkyum Lee, Kyu J. Han, Ian Lane

In this paper, we present a new i-vector based speaker adaptation method for automatic speech recognition with deep neural networks, focusing on in-vehicle scenarios. Our proposed method is, rather than augmenting i-vectors to acoustic feature vectors to form concatenated input vectors for adapting neural network acoustic model parameters, is to perform feature-space transformation with smaller transformation neural networks dedicated to acoustic feature vectors and i-vectors, respectively, followed by a layer of linear combination of the network outputs. This feature-space transformation is learned via semi-supervised learning without any parameter change in the original deep neural network acoustic model. Experimental results show that our proposed method achieves 18.3% relative improvement in terms of word error rate compared to the speaker independent performance, and verify that it has a potential to replace well-known feature-space Maximum Likelihood Linear Regression (fMLLR) in in-vehicle speech recognition with deep neural networks.