ISCA Archive ISCSLP 2002
ISCA Archive ISCSLP 2002

Structure-based compensation using an improved statistical linear approximation for Mandarin speech recognition over telephone

Zhao-Bing Han, Hua-Yun Zhang, Bo Xu

In this paper, a Vector Piecewise Polynomial (VPP) approximation algorithm is proposed for robust speech recognition in telecommunication environments. The method is formulated in a statistical framework in order to perform the optimal compensation of noise effect given the observed noisy speech, a model describing the statistics of the speech recorded in clean reference environment and the estimation of the noisy recognition environment.

The VPP algorithm is an extension of P.J.Moreno’s Vector Taylor Series (VTS) approximations for dealing with the distortion due to channel effects and background noise. We use a piecewise polynomial, namely two linear polynomials and a quadratic polynomial, to approximate the environment function (f(v)). Moreno replaced f(v) by its vector Taylor series approximation. It is well known that VTS is not precise if variables (v) are not close to the Taylor expansion points (v0). The VPP algorithm can overcome this defect. In addition, VPP estimates the parameters of the environment by the expectation-maximization (EM) algorithm.

Experimental results are presented in the paper on the application of this approach in improving the performance of Mandarin large vocabulary continuous speech recognition (LVCSR) due to different transmission channels (Such as fixed telephone line and GSM) and the background noise. The proposed VPP algorithm is found to converge fast. The method can reduce the average character error rate (CER) by about 12 %.