ISCA Archive Eurospeech 2003
ISCA Archive Eurospeech 2003

Data-driven pronunciation modeling for ASR using acoustic subword units

Thurid Spiess, Britta Wrede, Gernot A. Fink, Franz Kummert

We describe a method to model pronunciation variation for ASR in a data-driven way, namely by use of automatically derived acoustic subword units. The inventory of units is designed so as to produce maximal separable pronunciation variants of words while at the same time only the most important variants for the particular application are trained. In doing so, the optimal number of variants per word is determined iteratively. All this is accomplished (almost) fully automatically by use of a state splitting algorithm and a variant distance measure. Compared to a baseline system using triphones as subword units and with minimal pronunciation variants, this method achieved a relative improvement of the word error rate by 10%.