The object of this project is to begin the collection of a large corpus of Italian speech. This activity will provide infrastructure support to Italian speech and language technology. A series of corpora supporting both general research in these areas and/or specific evaluation tasks will be collected. The first corpus is oriented toward CVCV utterances.
The data were collected and organised in conformity with the standards currently being formulated by the Esprit Project 2589 "SAM".
The corpora will be distributed on CD-ROM by the Italian Superior Institute of Telecommunications and will be widely available to the whole Italian speech and language technology community.