ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Robust pitch estimation in noisy speech using ZTW and group delay function

RaviShankar Prasad, B. Yegnanarayana

Identification of pitch for speech signals recorded in noisy environments is a fundamental and long persistent problem in speech research. Several time domain based techniques attempt to exploit the periodic nature of the waveform using autocorrelation function and its variants. Other set of techniques utilize the harmonic structure in the spectral domain to identify pitch values. Either of these techniques suffer significant degradation in their performance in cases of noisy speech signals with low SNRs. The paper presents a robust technique to identify pitch values for speech signals. The proposed algorithm utilizes a speech analysis method called zero-time windowing (ZTW) where the signal is processed using a heavily decaying window, and the spectral characteristics are highlighted using the numerator of the group delay function. The amplitude contour of dominant resonances in the spectra are extracted, and processed further using a Gaussian window. The resulting contour reflects the energy profile of the signal which is utilized for estimation of the pitch values. The proposed algorithm is robust to degradations, and has been tested on several utterances with added noises. The algorithm exhibits significant increment in performance when compared to existing techniques.