ISCA Archive ISCSLP 2004
ISCA Archive ISCSLP 2004

Progress on Mandarin Conversational Telephone Speech Recognition

MeiYuh Hwang, Xin Lei, Tim Ng, Ivan Bulyko, Mari Ostendorf, Andreas Stolcke, Wen Wang, Jing Zheng, Venkata Ramana Rao Gadde, Martin Graciarena, Yan Huang, Manhung Siu

Over the past decade, there has been good progress on English conversational telephone speech (CTS) recognition, built on the Switchboard and Fisher corpora. In this paper, we present our efforts on extending language-independent technologies into Mandarin CTS, as well as addressing language-dependent issues such as tone. We will show the impact of each of the following factors: (a) simplified Mandarin phone set, (b) pitch features, (c) auto-retrieved web texts for augmenting ngram training, (d) speaker adaptive training, (e) maximum mutual information estimation, (f) decision-tree-based parameter sharing, (g) cross-word co-articulation modeling, and (h) combining MFCC and PLP decoding outputs using confusion networks. We have reduced the Chinese character error rate (CER) of the BBN-2003 development test set from 53.8% to 46.8% after (a)+(b)+(c)+(f)+(g) are combined. Further reduction in CER is anticipated after integrating all improvements.