ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus

Zhaoqing Li, Tianzi Wang, Jiajun Deng, Junhao Xu, Shoukang Hu, Xunying Liu

State-of-the-art end-to-end automatic speech recognition (ASR) systems are becoming increasingly complex and expensive for practical applications. This paper develops a high-performance and low-footprint 4-bit quantized Conformer ASR system. A key feature of the system design is to account for the fine-grained, varying performance sensitivity at different Conformer components to quantization errors. Neural architectural compression and mixed precision quantization approaches were used to auto-configure the optimal substructures and quantization bit-widths within each Conformer submodule. Experiments conducted on the 300-hr Switchboard data suggest that the obtained auto-configured systems consistently outperform the uniform precision quantized baseline Conformer of comparable bit-widths in terms of word error rate (WER). An overall "lossless" compression ratio of 16.2 times was obtained over the 32-bit full-precision baseline while incurring no statistically significant WER increase.