ISCA Archive Eurospeech 1995
ISCA Archive Eurospeech 1995

Design of a phonetic corpus for a speech database in basque language

K. Lopez de Ipina, I. Torres, L. Onederra

The design of Continuous Speech Recognition System requires to select a large amount of spoken data for each specific language. The goal of this work was the design of a Phonetic Corpus for a Speech Database in Basque language. Several samples of nowadays narrative, spoken language and newspaper language were previously analysed under a phonetic point of view. The Speech Database finally designed consisted of a Phonetic Corpus including 300 sentences phonetically balanced uttered twice by 40 speakers resulting in about 900.000 allophones. Two additional corpora of digits and short words completed the database. This database includes the adequate distribution of allophones and contexts to model Basque phones in both, Speech Recognition Systems and Linguistic analysis frameworks. Keywords: Speech Databases, Basque language.