ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering

Xuxin Cheng, Zhihong Zhu, Ziyu Yao, Hongxiang Li, Yaowei Li, Yuexian Zou

Spoken question answering (SQA) aims to identify the correct answer to the given the question from a spoken passage. Most conventional SQA frameworks combine an automatic speech recognition (ASR) module and a text question answering (TQA) module in a cascaded manner, which might suffer from error propagation and high latency. To tackle these issues, several end-to-end SQA frameworks based on Textless NLP are proposed. However, existing end-to-end models still fail to outperform the cascade models with the similar number of parameters. In this paper, to improve textless SQA, we propose GhostT5, which generates more features from the remaining features with very cheap operations for stronger performance. Experiment results and further analysis show that our GhostT5 achieves the new state-of-the-art performance on NMSQA dataset and surpasses cascaded SQA models. More encouragingly, GhostT5 surpasses the previous best end-to-end SQA model with less than half of the parameters.