ISCA Archive issp 2024
ISCA Archive issp 2024

Exploration and classification of vocal fry, period doubling, and modal voice using acoustic and EGG measures

Yaqian Huang
How subtypes of creaky voice such as vocal fry and period doubling can be classified according to their acoustic as well as phonatory correlates is not entirely clear. This study explores the distinctions of the above-mentioned creaky voice types as compared to modal voice, using machine classification methods to investigate the importance of source and filter characteristics represented by acoustic and electroglottographic (EGG) measures. Tokens of vocal fry, period doubling, and modal voice were visually identified in a scripted Mandarin corpus using EGG as these non-modal voice qualities were found abundantly across Mandarin tones. To control for the multicollinearity and overfitting issues, an l1 regularization (Lasso) was used to fit the multinomial logistic regression. Random forest models were also used to predict these voicing types and compared with the logistic models. Adding the EGG measures largely improved all model performances, both supported by the separable clusters shown by explorative visualization and the macro average precision and recall scores. The most important measures according to the random forest models were f0, H1-H2, H1, SoE, H2, and HNR (0-500Hz), as well as the duration of the decontacting phase and contact quotient of the glottal pulse. Implications between human perception and phonatory measures are discussed.