Two schemes to obtain phonemic transcriptions of spoken utterances are described and compared. Both schemes utilize the so called Self-Organizing Kohonen Maps first to vector quantize speech into a sequence of phoneme labels centisecond apart. In the original scheme, this quasiphoneme sequence is converted into a phoneme string with simple durational transformation rules. In the scheme introduced in this paper, the conversion is carried out by using a multi-layered feed-forward network trained with error back propagation. The achieved phonemic recognition error rate is about 2.5 per cent units better with the multi-layered network approach (19.2% opposed to 21.7%). However, the back propagation algorithm requires a vast amount of training compared to the rule-based method.