This paper presents a simple sequence-to-sequence approach to restore
standard orthography in raw, normalized speech transcripts, including
insertion of punctuation marks, prediction of capitalization, restoration
of numeric forms, formatting of dates and times, and other, fully data-driven
adjustments. We further describe our method to generate synthetic parallel
training data, and explore suitable performance metrics, which we align
with human judgment through subjective MOS-like evaluations.
Our models for English,
Russian, and German have a word error rate of 6.36%, 4.88%, and 5.23%,
respectively. We focus on simplicity and reproducibility, make our
framework available under a BSD license, and share our base models
for English and Russian.