ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Whisper Encoder features for Infant Cry Classification

Monil Charola, Aastha Kachhi, Hemant A. Patil

Identifying the pathology in infant cry is an important and socially relevant research problem, as it can save the lives of many infants. This study proposes the use of transfer learning based approach using Whisper Encoder Module which is compared against state-of-the art MFCC feature set for classification of normal vs. pathological infant cry. Moreover, we also present multi-class pathological infant cry classification using CNN and Bi-LSTM networks. Our study finds that whisper encoder module coupled with DNN classifiers such as CNN and Bi-LSTM outperform MFCC features with absolute increment of 4% and 1% on CNN and Bi-LSTM respectively. Furthermore, whisper encoder features are analysed using statistical parameters and t-SNE plots. The experiments are performed using the 10-fold cross-validation on Baby Chillanto dataset, In-house DA-IICT dataset, and also on the datasets formed by combining these two datasets.