ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition

Jihwan Kim, Jisung Wang, Sangki Kim, Yeha Lee

Neural architecture search (NAS) has been successfully applied to finding efficient, high-performance deep neural network architectures in a task-adaptive manner without extensive human intervention. This is achieved by choosing genetic, reinforcement learning, or gradient -based algorithms as automative alternatives of manual architecture design. However, a naive application of existing NAS algorithms to different tasks may result in architectures which perform sub-par to those manually designed. In this work, we show that NAS can provide efficient architectures that outperform manually designed attention-based architectures on speech recognition tasks, after which we named Evolved Speech-Transformer (EST). With a combination of carefully designed search space and Progressive dynamic hurdles, a genetic algorithm based, our algorithm finds a memory-efficient architecture which outperforms vanilla Transformer with reduced training time.