ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Prompt Tuning for Speech Recognition on Unknown Spoken Name Entities

Xizi Wei, Stephen McGregor

This paper explores the challenge of recognising relevant but previously unheard named entities in spoken input. This scenario pertains to real-world applications where establishing an automatic speech recognition (ASR) model trained on new entity phrases may not be efficient. We propose a technique that involves fine-tuning a Whisper model with a list of entity phrases as prompts. We establish a task-specific dataset where stratification of different entity phrases supports evaluation of three different scenarios in which entities might be encountered. We focus our analysis on a seen-but-unheard scenario, reflecting a situation where only textual representations of novel entity phrases are available for a commercial banking assistant bot. We show that a model tuned to anticipate prompts reflecting novel named entities makes substantial improvements in entity recall over non-tuned baseline models, and meaningful improvements in performance over models fine-tuned without a prompt.