ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

CORPORA - speech database for Polish diphones

Stefan Grocholewski

In the paper the attempts for creating the first databases for Polish are presented. Among two databases, supported by The Polish National Research Committee, and COPERNICUS project (1304 "BABEL: a Multi-Language Database" for Polish, Bulgarian, Estonian, Hungarian, Romanian) the first of them is presented in detail. The speech material contains 365 utterances (alphabet letters, digits, 200 first names, 114 sentences) uttered by 45 speakers. In the paper the design ideas, recording conditions, annotation rules, the method of automatic segmentation and labelling used in CORPORA are presented.