ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Do Vocal Breath Sounds Encode Gender Cues for Automatic Gender Classification?

Mohammad Shaique Solanki, Ashutosh Bharadwaj, Jeevan Kylash, Prasanta Kumar Ghosh

The acoustic features of continuous speech, such as pitch (F0) and formant frequencies (F1, F2) have been utilized for gender classification. However, non-speech signals including vocal breath sounds have not been explored due to the absence of gender-specific acoustic features. This study investigates if vocal breath sounds carry gender information and if they can be used for automatic gender classification. The study examines the use of data-driven and knowledge-based features from breath sounds, classifier complexity, and the importance of breath signal segment location and duration. Results from experiments on 54 minutes of male and 52 minutes of female breath sounds demonstrate that classifiers with low-complexity and knowledge-based features (MFCC statistics) perform similarly to high-complexity classifiers with data-driven features. Breath segments of around 3 seconds are found to be the most suitable choice regardless of location, eliminating the need for breath cycle boundary marking.