ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Sooner or later: exploring asynchrony in multi-band speech recognition

Nikki Mirghafori, Nelson Morgan

Multi-band speech recognition is an exploratory paradigm in which each frequency region is treated as a distinct source of information and the streams are combined after each is processed independently. A number of researchers have hypothesized that it is advantageous to combine the sub-frequency information in an asynchronous manner. This paper examines this hypothesis, using two different approaches in relaxing synchrony constraints: HMM decomposition/recombination [19] and two-level dynamic programming (DP) [16]. Drawing on this work and those of others [2, 18], we conclude that relaxing the synchrony constraints indiscriminately for all phone-to-phone transitions does not consistently and significantly reduce the word error rate. The optimal permissible asynchrony must depend on both the phone-class transitions and the training-data statistics.