An approach to speech recognition using syllables as basic modelling units is compared to a state-of-the-art system employing phonemes. The technological framework is a hybrid HMM-ANN 1 recognition system applied on small to medium vocabulary recognition tasks. Although the number of units to be classified nearly doubles, it is shown that the syllable can outperform the phoneme slightly but significantly in terms of unit classification capability, measured as frame error rate. Compar- ing the overall system performance (measured in word error rate) the phoneme-based system still performs obviously better for continuous speech tasks, while the syllable-based system is superior for isolated word recognition tasks on cross-database tests. This suggests the need for further work on the understanding of the interaction of knowledge sources on the frame-, word-, and sentence-level in current recognition systems.