ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

TfCleanformer: A streaming, array-agnostic, full- and sub-band modeling front-end for robust ASR

Jens Heitkaemper, Joe Caroselli, Arun Narayanan, Nathan Howard

Multiple recent publications have demonstrated the benefits of neural network based enhancement in the time-frequency domain. This paper builds on those findings to improve upon a recently published streaming, array agnostic multi-channel enhancement system called Cleanformer. The proposed streaming enhancement system achieves competitive results against a non-causal state-of-the-art model on a source separation task, outperforming Cleanformer. Additionally, the presented model improves upon Cleanformer enhancement results in multiple challenging environments without introducing further latency. A short ablation study is performed to evaluate the influence of the proposed changes on the improved performance.