This paper reports on work bringing together disfluency coding carried out by Lickley [1] and recognition work carried out as part of the ERF project (Bard, Thompson & Isard, [2]) at Edinburgh University. A set of factors are investigated which characterise the behaviour of the ASR during recognition based on an analysis of the resulting word laffice. These factors can be grouped as: Entropy Factors - the entropy of the acoustic and language model likelihoods, within the word lattice, over a 10 ms frame, and, Arc Factors - the number of non-unique and unique arcs in the word lattice in any given 1 Oms time frame, together with the variance of start and end times of these arcs, and the number of arcs starting or ending in the frame.
The values of all factors were used to train a simple CART model. The CART model was used to predict: recognition failure, interruption point location (the point where a disfluency begins), and whether the location was in a repair or a reparandum.
The entropy of the language model values contributed most to the models prediction of recognition failure, and whether a frame was in a repair or reparandum. In contrast, the number of unique word hypotheses contributed most to the successful prediction of a frame being close to an interruption point.
s Lickley, Robin J. HCRC disfluency coding manual, HCRC Tech. Rept. HCRC/TR-100 Bard, Ellen G. / Thompson, Henry S. / Isard, Steve (2000): ERF: Exploiting recognition failures in automatic recognition of disfluent speech. EPSRC, SALT GR/L50280 Final Report.