Recently, kernel eigenvoices were revisited using kernel representations of distributions for rapid nonlinear speaker adaptation. These representations reassure the validity of the adapted distribution functions and enable expectation-maximisation training. Though gains have been shown in terms of word error rate for rapid speaker adaptation, this approach leads to an increase in decoding cost as the number of likelihood evaluations is amplified. The present paper addresses this issue by providing a coherent framework for systematic probabilistic approaches aimed at reducing the recognition cost and yet yielding equally powerful adapted models. The common denominator of such approaches is the use of probabilistic criteria, such as Kullback-Leibler divergence. However, in the general case, the resulting adapted models have full covariance matrices. In order to overcome this issue, the use of predictive semi-tied transforms to yield diagonal covariances for decoding is investigated in this paper. Experimental results are presented on a large-vocabulary conversational telephone task.
Index Terms: kernel eigenvoices, compact nonlinear adaptation, Kullback Leibler divergence