ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

W2V2-Light: A Lightweight Version of Wav2vec 2.0 for Automatic Speech Recognition

Dong-Hyun Kim, Jae-Hong Lee, Ji-Hwan Mo, Joon-Hyuk Chang

Wav2vec 2.0 (W2V2) has shown remarkable speech recognition performance by pre-training only with unlabeled data and fine-tuning with a small amount of labeled data. However, the practical application of W2V2 is hindered by hardware memory limitations, as it contains 317 million parameters. To ad- dress this issue, we propose W2V2-Light, a lightweight version of W2V2. We introduce two simple sharing methods to reduce the memory consumption as well as the computational costs of W2V2. Compared to W2V2, our model has 91% lesser parameters and a speedup of 1.31 times with minor degradation in downstream task performance. Moreover, by quantifying the stability of representations, we provide an empirical insight into why our model is capable of maintaining competitive performance despite the significant reduction in memory