ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Large broadcast news and read speech corpora of spoken czech

Josef Psutka, Vlasta Radova, Ludek Müller, Jindrich Matousek, Pavel Ircing, David Graff

This paper presents the first annotated and phonetically transcribed large speech corpora developed for spoken Czech. All corpora were collected during the last two years at the Department of Cybernetics, University of West Bohemia (UWB) in Pilsen. The first two collections are broadcast news, the third corpus is a high-quality read-speech database. This paper describes the collection conditions, annotation and phonetic transcription process related to each corpus. The basic phonetic and lexical characteristics of all corpora will be given and compared mutually.