ISCA Archive RSR 1997
ISCA Archive RSR 1997

Robust speech recognition based on multi-stream features

Stephane Dupont, Hervé Bourlard, Christophe Ris

In this paper, we discuss a new automatic speech recognition (ASR) approach based on the independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a different characteristic of the signal. If the streams are entirely synchronous, they may be accommodated simply. However, as discussed in the paper, it may be required to permit some degree of asynchrony between streams, which are then forced to recombine at some temporal "anchor points" associated with some (pre-defined) speech unit levels. We start by introducing the basic framework of a statistical structure that can accommodate multiple observation streams. This approach was initially applied to the case of subband-based speech recognition and was shown to yield significantly better noise robustness. After having summarized these results, the multi-stream approach will be used to combine multiple time-scale features in ASR systems (in our case, to use syllable level features in a phoneme-based HMM system).