ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Adaptive estimation of time-varying features from high-pitched speech based on an excitation source HMM

Akira Sasou, Kazuyo Tanaka

This paper describes a method of extracting time-varying features that is effective for speech signals with high fundamental frequencies. The proposed method adopts a speech production model that consists of a Time-Varying Auto-Regressive (TVAR) process for an articulatory filter and a Hidden Markov Model (HMM) for an excitation source. The model represents waveform amplitude variations by time-varying gain of the excitation source. The proposed algorithm is given by extending a Viterbi algorithm so that the proposed algorithm can adaptively estimate TVAR coefficients and time-varying gain with decoding the state transition of the excitation source HMM. We applied the proposed method to extracting time-varying features from both synthetic and natural speech, and confirmed its feasibility.