ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Robust pitch estimation using l1-regularized maximum likelihood estimation

Feng Huang, Tan Lee

This paper presents a new method of robust pitch estimation using sparsity-based estimation techniques. The method is developed based on sparse representation of a temporal-spectral pitch feature. The robust pitch feature is obtained by accumulating spectral peaks over consecutive frames. It is expressed as a sparse linear combination of an over-complete set of peak spectrum exemplars. The probability distribution of the noise is assumed to be Gaussian with non-zero mean. The weights of the linear combination are estimated by maximizing the likelihood of the feature under sparsity constraint. The sparsity constraint is incorporated as an l1 regularization term. From the estimated weights, the major constituent exemplars are identified and the fundamental frequency is determined. Experimental results show that, with this method, pitch estimation accuracy is significantly improved, particularly at low signal-to-noise ratios.

Index Terms: Robust pitch estimation, speech sparsity, l1 regularization, peak spectrum


doi: 10.21437/Interspeech.2012-137

Cite as: Huang, F., Lee, T. (2012) Robust pitch estimation using l1-regularized maximum likelihood estimation. Proc. Interspeech 2012, 378-381, doi: 10.21437/Interspeech.2012-137

@inproceedings{huang12b_interspeech,
  author={Feng Huang and Tan Lee},
  title={{Robust pitch estimation using l1-regularized maximum likelihood estimation}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={378--381},
  doi={10.21437/Interspeech.2012-137},
  issn={2958-1796}
}