Two artificial neural networks have been trained to recognise phonemes in continuous speech: multi-layer perceptron (MLP) nets and probabilistic neural networks (PNN). The speech material was recorded by one male Swedish speaker and the sentences were phonetically labelled. Fifty sentences were used for training and another fifty were used for testing. Both networks had a single hidden layer and 38 output nodes corresponding to Swedish phonemes. The MLP was trained by the supervised back-propagation algorithm. The PNN was trained by a self-organising clustering algorithm, a stochastic approximation to the expectation maximisation algorithm. The classification results for a feed-forward MLP and the PNN were rather similar, but an MLP with simple recurrency using context nodes gave the best performance. Several other differences of practical value was noted.
Keywords: phoneme recognition, carticulation, back-propagation, multi-layer perceptron, simple recurrency, probabilistic neural network, expectation maximisation, supervised/unsupervised training.