ISCA Archive DiSS 2013
ISCA Archive DiSS 2013

HESITA(tions) in Portuguese: a database

Sara Candeias, Dirce Celorico, Jorge Proença, Arlindo Veiga, Fernando Perdigão

With this paper we present a European Portuguese database of hesitations in speech. Under the name of HESITA, this database contains annotations of hesitation events, such as filled pauses, vocalic extensions, truncated words, repetitions and substitutions. The hesitations were found over 30 daily news programs collected from podcasts of a Portuguese television channel. The database also includes speaking style classification as well as acoustical information and other speech events. Statistic analysis of the hesitation events in terms of their occurrence is presented. Insights into the process of human speech communication can be extracted from this database, which encloses relevant information about how Portuguese speakers hesitate. The HESITA database is freely available online to the research community.

Index Terms: hesitations, disfluency, prepared speech, spontaneous speech, annotation, hesitation corpus

Cite as: Candeias, S., Celorico, D., Proença, J., Veiga, A., Perdigão, F. (2013) HESITA(tions) in Portuguese: a database. Proc. Disfluency in Spontaneous Speech (DiSS 2013), 13-16

  author={Sara Candeias and Dirce Celorico and Jorge Proença and Arlindo Veiga and Fernando Perdigão},
  title={{HESITA(tions) in Portuguese: a database}},
  booktitle={Proc. Disfluency in Spontaneous Speech (DiSS 2013)},