ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Extension and further analysis of higher order cepstral moment normalization (HOCMN) for robust features in speech recognition

Chang-wen Hsu, Lin-shan Lee

Cepstral normalization has been popularly used as a powerful approach to produce robust features for speech recognition. Good examples of approaches include the well known Cepstral Mean Subtraction (CMS) and Cepstral Mean and Variance Normalization (CMVN), in which either the first or both the first and the second moments of the Mel-frequency Cepstral Coefficients (MFCCs) are normalized [1, 2]. Such approaches were extended previously to Higher Order Cepstral Moment Normalization (HOCMN) for normalizing moments with orders much higher than two [3]. Here we further extend HOCMN to a more generalized form with the generalized moment with non-integer orders defined in this paper. Extensive experimental results based on a newly defined development set for AURORA 2.0 indicated that not only HOCMN for integer moment orders can perform significantly better than the well-known approach of Histogram Equalization (HEQ), but some further improvements can be consistently obtained for almost all SNR values with non-integer moment orders. The theoretical foundation behind the approaches proposed here which explains why HOCMN can perform well and how the statistical properties of the distributions of the MFCC parameters are adjusted during the normalization processes were also discussed.


doi: 10.21437/Interspeech.2006-11

Cite as: Hsu, C.-w., Lee, L.-s. (2006) Extension and further analysis of higher order cepstral moment normalization (HOCMN) for robust features in speech recognition. Proc. Interspeech 2006, paper 1748-Mon1A2O.5, doi: 10.21437/Interspeech.2006-11

@inproceedings{hsu06_interspeech,
  author={Chang-wen Hsu and Lin-shan Lee},
  title={{Extension and further analysis of higher order cepstral moment normalization (HOCMN) for robust features in speech recognition}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1748-Mon1A2O.5},
  doi={10.21437/Interspeech.2006-11},
  issn={2958-1796}
}