ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

New improvements in decoding speed and latency for automatic captioning

Jian Xue, Rusheng Hu, Yunxin Zhao

In this paper, we present new improvements in decoding speed and latency for automatic captioning in telehealth. Complementary local word confidence scores are used to prune uncompetitive search paths. Subspace distribution clustering hidden Markov modeling (SDCHMM) is used for fast generation of acoustic and local confidence scores, where overlap accumulative probability (OAP) is used to measure the similarity of Gaussian pdf’s in SDCHMM. We propose to use pre-backtrace based on detection of prosodic boundaries defined by unfilled pauses, filled pauses, as well as pitch contour to decrease latency. Experiments were conducted on a telehealth captioning task with vocabulary sizes of 21 K and 46 K. The proposed methods led to 33% improvement in decoding speed without loss of word accuracy, and to 3 folds of decrease in maximum latency with about 1.6% loss of word accuracy.