ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation

Yuki Kitagishi, Naohiro Tawara, Atsunori Ogawa, Ryo Masumura, Taichi Asami

We compare speaker age estimation results obtained by human listeners and a latest deep neural network (DNN) model to reveal differences in their estimation characteristics. A DNN model can achieve high speaker age estimation performance and is expected to be utilized in practical applications. Only a few studies compared speaker-age estimation performance between human listeners and machine learning models. However, the differences in their estimation characteristics have yet to be revealed. Our experimental results reveal that the DNN model performs comparable or superior to the listeners but is more sensitive to elderly speech, acoustic characteristics, and lengths of speech samples than the listeners. The results also reveal that the speakers' gender and some specific acoustic features negatively affect the listeners' estimation performance.