In this paper, a Vector Piecewise Polynomial (VPP) approximation algorithm is proposed for robust speech recognition in telecommunication environments. The method is formulated in a statistical framework in order to perform the optimal compensation of noise effect given the observed noisy speech, a model describing the statistics of the speech recorded in clean reference environment and the estimation of the noisy recognition environment.
The VPP algorithm is an extension of P.J.Morenos Vector Taylor Series (VTS) approximations for dealing with the distortion due to channel effects and background noise. We use a piecewise polynomial, namely two linear polynomials and a quadratic polynomial, to approximate the environment function (f(v)). Moreno replaced f(v) by its vector Taylor series approximation. It is well known that VTS is not precise if variables (v) are not close to the Taylor expansion points (v0). The VPP algorithm can overcome this defect. In addition, VPP estimates the parameters of the environment by the expectation-maximization (EM) algorithm.
Experimental results are presented in the paper on the application of this approach in improving the performance of Mandarin large vocabulary continuous speech recognition (LVCSR) due to different transmission channels (Such as fixed telephone line and GSM) and the background noise. The proposed VPP algorithm is found to converge fast. The method can reduce the average character error rate (CER) by about 12 %.