We present a Multi-State Time Delay Neural Network (MS-TDNN) for speaker-independent, connected letter recognition. Our MS-TDNN achieves 98.5/92.0% word accuracy on speaker dependent/independent English letter tasks[7, 8]. In this paper we will summarize several techniques to improve (a) continuous recognition performance, such as sentence level training, and (b) phonetic modeling, such as network architectures with "internal speaker models", allowing for "tuning-in" to new speakers. We also present results on our large and still growing new German Letter data base, containing over 40.000 letters continuously spelled by 55 speakers.
Keywords: Spelled Letter Recognition, Speaker-Independence, MS-TDNN