ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Distilling knowledge from Gaussian process teacher to neural network student

Jeremy H. M. Wong, Huayun Zhang, Nancy F. Chen

Neural Networks (NN) and Gaussian Processes (GP) are different modelling approaches. The former stores characteristics of the training data in its many parameters, and then performs inference by parsing inputs through these parameters. The latter instead performs inference by computing a similarity between the test and training inputs, and then predicts test outputs that are correlated with the reference training outputs of similar inputs. These models may be combined to leverage upon their diversity. However, both combination and the matrix computations for GP inference are expensive. This paper investigates whether a NN student is able to effectively learn from the information distilled from a GP or ensemble teacher. It is computationally cheaper to infer using this student. Experiments on the speechocean762 spoken language assessment dataset suggest that learning is effective.