This paper proposes a method that efficiently controls the beam width and so yields computation times that permit the practical automatic transcription of massive volumes of speech data. In particular, we focus on the fact that a lot of time is wasted in attempting to recognize poor quality speech samples which will yield erroneous transcripts and thus provide no useful data for subsequent text processing. To stabilize the recognition time regardless of speech quality, our proposal controls the score beam width efficiently based on overall score spread against each target speech sample on the premise of stored speech; it formulates the prior score range within the beam width and maximizes computation efficiency by normalizing the range associated with the survival rate of hypotheses. The technique proposed herein can rapidly estimate the range by using just monophones prior to speech recognition decoding. Experiments with several SNRs and real call-center speech sets confirm a reduction in computation time while matching the accuracy of existing techniques.
Index Terms: speech recognition, decoding parameter optimization, beam search