Freely available software packages for the training of Conditional Random Fields, e.g. CRF++, do not support longer n-grams than bigram, which can be attributed to the fact that training CRFs in original notation has a polynomial computation time dependence on the target vocabulary size and an exponential dependence on the n-gram size. We transfer the backing-off idea from language models to CRFs. We realized the software with Finite State Transducers, where we modified the calculation of the posterior algorithm. To implement the backing-off scheme, we applied Failure-transitions(Ó) known from Open-FST. Proof of concept is given on the semantic tagging task MEDIA and on the grapheme-tophoneme (G2P) conversion tasks NETtalk and Celex, showing that computational time increases much below the size of the target vocabulary and showing error rate reduction on the G2P tasks.