Exploration and classification of vocal fry, period doubling, and modal voice using acoustic and EGG measures
Yaqian Huang
How subtypes of creaky voice such as vocal fry and period
doubling can be classified according to their acoustic as well
as phonatory correlates is not entirely clear. This study
explores the distinctions of the above-mentioned creaky voice
types as compared to modal voice, using machine
classification methods to investigate the importance of source
and filter characteristics represented by acoustic and
electroglottographic (EGG) measures. Tokens of vocal fry,
period doubling, and modal voice were visually identified in a
scripted Mandarin corpus using EGG as these non-modal
voice qualities were found abundantly across Mandarin tones.
To control for the multicollinearity and overfitting issues, an
l1 regularization (Lasso) was used to fit the multinomial
logistic regression. Random forest models were also used to
predict these voicing types and compared with the logistic
models. Adding the EGG measures largely improved all model
performances, both supported by the separable clusters shown
by explorative visualization and the macro average precision
and recall scores. The most important measures according to
the random forest models were f0, H1-H2, H1, SoE, H2, and
HNR (0-500Hz), as well as the duration of the decontacting
phase and contact quotient of the glottal pulse. Implications
between human perception and phonatory measures are
discussed.