ISCA Archive SpeechProsody 2012
ISCA Archive SpeechProsody 2012

Multi methods pitch tracking

Philippe Martin

The elaboration of rather large spontaneous speech corpora frequently implies the collection of data recorded with poor acoustic quality which may affect its acoustic analysis, and particularly fundamental frequency tracking (F0). Indeed, F0 analysis is particularly sensitive to distortion due to low signal to noise ratio, filtering of low frequencies, encoding in compressed formats (mp3, wma, …), room echo, not to mention the presence of external sound sources (car engine, overlapping speech segments, etc.). In order to obtain a more reliable F0 analysis, it can be noted that some fundamental frequency algorithms are more reliable than others on specific voiced segments, depending on complex characteristics such as rate of F0 change, intensity of the first harmonic, presence of echo, etc. For that reason a system (implemented in the software package WinPitch) is proposed to allow the user to select various tracking algorithms, adjust their parameters and apply a selected tracking method on the speech segments considered. The user is guided in this operation by an underlying narrow band spectrogram, which allows visual checking of the validity of the local F0 analysis by comparison between the F0 curve and the spectrogram low harmonics.

Index Terms: speech prosody, fundamental frequency tracking, intonation.