Online-Recognition requires the acoustic model to provide posterior
probabilities after a limited time delay given the online input audio
data. This necessitates unidirectional modeling and the standard solution
is to use unidirectional long short-term memory (LSTM) recurrent neural
networks (RNN) or feed-forward neural networks (FFNN).
It is known that bidirectional
LSTMs are more powerful and perform better than unidirectional LSTMs.
To demonstrate the performance difference, we start by comparing several
different bidirectional and unidirectional LSTM topologies.
Furthermore, we apply
a modification to bidirectional RNNs to enable online-recognition by
moving a window over the input stream and perform one forwarding through
the RNN on each window. Then, we combine the posteriors of each forwarding
and we renormalize them. We show in experiments that the performance
of this online-enabled bidirectional LSTM performs as good as the offline
bidirectional LSTM and much better than the unidirectional LSTM.