ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

Speech recognition over packetized voice systems

Bo Baungaard, Jorn Stern Nielsen

This paper reports the results from an assessment of speech recognition over a packetized voice system, which applies Adaptive Pulse Code Modulation (ADPCM) with different compression rates. In the future the penetration of packetized voice systems in the telephone network will increase. As speech recognition is foreseen to play an important role in services in the network, the speech recognition systems must be robust to the degradation introduced by the coding/decoding, which often is used in packetized voice systems. Three ADPCM coding schemes are assessed: 32 kbit/s, 24 kbit/s, and 16 kbit/s. A test with 64 kbit/s PCM is also conducted. All tests are carried out on a commercially available packetized voice system. The assessment is based on speaker independent recognition of isolated words using Continuous Hidden Markov Models (CHMM). Both whole word and triphone models are applied. The results clearly show that performance depends on the applied coding scheme. Performance decreases when training and test data have different coding schemes, and especially for 16 kbit/s a dramatic decrease is observed. Further is the capability to perform correct rejection highly influenced by the applied coding scheme.