ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Domain-Aware Data Selection for Speech Classification via Meta-Reweighting

Junghun Kim, Ka Hyun Park, Hoyoung Yoon, U Kang

Given speeches from diverse domains, how can we train an accurate classifier for a specific target domain utilizing the other source domains? The problem commonly arises in real-world scenarios, such as identifying the intents of speeches from individuals with a specific speech disorder using those of other disorders. However, existing data selection methods for utilizing the source instances encounter two main challenges: they cannot consider the diversities of source domains, and their hard selection schemes may ignore helpful source instances if the given information of the target domain is insufficient. In this work, we propose DOREME, a domain-aware data selection method for accurate speech classification on a target domain. The key idea is to softly select source instances by dynamically assigning importance scores to each instance based on two similarities: instance-scores and domain-scores. Various experiments show that DOREME achieves the best classification accuracy.