ISCA Archive SIGUL 2023
ISCA Archive SIGUL 2023

Short-Cutting Manual Acquisition in Deep-Learning Deciphering of Old Documents

Dan Cristea, Petru Rebeja

We present an approach to ease the effort of acquiring annotation data intended to train a technology for automatic transcription and transliteration of old documents. The research is motivated by the interest to decipher in the Latin script Romanian documents written in Cyrillic along the centuries XVIth-XIXth. The whole enterprise is briefly described, then the attention concentrates on an alignment algorithm that reuses manual transcripts for the benefit of automatically acquiring training data to enhance neural network models. The approach is easily reproducible for other languages.