ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Learning from Multiple Annotator Biased Labels in Multimodal Conversation

Kazutoshi Shinoda, Nobukatsu Hojo, Saki Mizuno, Keita Suzuki, Satoshi Kobashikawa, Ryo Masumura

In multimodal conversation analysis, annotating social signals such as speakers' communication skills is inherently subjective and prone to individual annotator bias, which is annotator's tendency to assign labels based on their values. These biases can contribute to label distributions biased towards specific speakers and classes that match annotators' values, leading to degraded classification performance for minority classes and speakers. Existing methods for addressing class imbalance and dataset bias often overlook the variable biases introduced by multiple annotators, which can lead to overfitting to the majority. Thus, we propose a novel two-stage debiasing method, MAD-LM, that first learns the typical label distribution for each annotator and then promotes the learning of untypical labels. MAD-LM effectively mitigates performance degradation for the minority in a multimodal conversation dataset with multiple annotator labels, while maintaining the performance for the majority.