ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Iterative Optimal Preemphasis for Improved Glottal-Flow Estimation by Iterative Adaptive Inverse Filtering

Parham Mokhtari, Hiroshi Ando

Iterative adaptive inverse filtering (IAIF) [1] remains among the state-of-the-art algorithms for estimating glottal flow from the recorded speech signal. Here, we re-examine IAIF in light of its foundational, classical model of voiced (non-nasalized) speech, wherein the overall spectral tilt is caused only by lip-radiation and glottal effects, while the vocal-tract transfer function contains formant peaks but is otherwise not tilted. In contrast, IAIF initially models and cancels the formants after only a first-order preemphasis of the speech signal, which is generally not enough to completely remove spectral tilt.

Iterative optimal preemphasis (IOP) is therefore proposed to replace IAIF’s initial step. IOP is a rapidly converging algorithm that models a signal (then inverse-filters it) with one real pole (zero) at a time, until spectral tilt is flattened. IOP-IAIF is evaluated on sustained /a/ in a range of voice qualities from weak-breathy to shouted-tense. Compared with standard IAIF, IOP-IAIF yields: (i) an acceptable glottal flow even for a weak breathy voice that the standard algorithm failed to handle; (ii) generally smoother glottal flows that nevertheless retain pulse shape and closed phase; and (iii) enhanced separation of voice qualities in both normalized amplitude quotient (NAQ) and glottal harmonic spectra.