ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Robust Pitch Estimation Using Multi-Branch CNN-LSTM and 1-Norm LP Residual

Mudit D. Batra, JAYESH, C.S. Ramalingam

Pitch and voicing determination are important in many speech and audio signal processing applications. Even in the clean signal case their estimation can pose problems, and more so when noise is present. In this paper we propose a Multi-Branch CNN-LSTM based Temporal Neural Network for pitch and voicing determination. In addition, rather than using the raw waveform, we use the ℓ1-norm based LP residual as the input signal. These changes have made the proposed method more robust to SNR degradation, i.e., even though there is a slight fall in accuracy in the clean signal case, there is a 2.9% absolute increase in RPA for the 0 dB case when compared with the CREPE algorithm. More importantly, when the RPA tolerance is tightened, the fall in accuracy is smaller. This robustness has been achieved with only 1.79M parameters, which is an order of magnitude less than what is used in CREPE.