ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition

Kwangyoun Kim, Suwon Shon, Yi-Te Hsu, Prashant Sridhar, Karen Livescu, Shinji Watanabe

Parameter-efficient fine-tuning (PEFT) methods, which train only a part of a model, yield efficient and effective models. Bottleneck approaches, such as adapters and low-rank adaptation (LoRA), have been found to be beneficial in numerous studies and are widely utilized. In this work, we propose and investigate an enhanced PEFT method that adds convolution to linear projection-based bottleneck approaches. We experiment with HuBERT, a representative speech model pre-trained with self-supervised learning, and fine-tune it for the automatic speech recognition (ASR) task to examine how the proposed PEFT method impacts training and inference. We demonstrate consistent performance improvements with a minimal increase in parameters and computational complexity.