ISCA Archive Eurospeech 1989
ISCA Archive Eurospeech 1989

Using self-organizing maps and multi-layered feed-forward nets to obtain phonemic transcriptions of spoken utterances

Mikko Kokkonen, Kari Torkkola

Two schemes to obtain phonemic transcriptions of spoken utterances are described and compared. Both schemes utilize the so called Self-Organizing Kohonen Maps first to vector quantize speech into a sequence of phoneme labels centisecond apart. In the original scheme, this quasiphoneme sequence is converted into a phoneme string with simple durational transformation rules. In the scheme introduced in this paper, the conversion is carried out by using a multi-layered feed-forward network trained with error back propagation. The achieved phonemic recognition error rate is about 2.5 per cent units better with the multi-layered network approach (19.2% opposed to 21.7%). However, the back propagation algorithm requires a vast amount of training compared to the rule-based method.