ISCA Archive ISCSLP 2002
ISCA Archive ISCSLP 2002

Improving performance of telephone-based Mandarin speech recognition

Huayun Zhang, Bo Xu, Taiyi Huang

Since telephone is the only ubiquitous communications device in current world, it is the largest potential application field for speech techniques. Telephony speech recognition is a core technique for such telephone-based speech applications. It is well known that the bandwidth of telephone line is limited to 300~3400Hz and there are many inherent variations within the telephone network. All these make speech recognition over telephone a more difficult task compared to its desktop pairs. Additionally, due to the freely speaking style required by real applications and the diverse background environment, a perfect laboratory system may become very vulnerable in real world. So the robustness is the life-and-death issue for such commercial systems. In this paper, we will introduce our recent progresses on improving the performance for a Mandarin telephony speech recognition system. Our improvements include a more robust and straightforward feature extraction block for telephony speech and a novel dynamic channel compensation algorithm. And then we will focus our discussion on the strategy of dealing with outof- vocabulary (OOV) utterances. Through all these amendments, the system’s performance obviously improves in real applications.