Corpus EPTIC

All corpora are available as morpho-syntactically and structural annotated text.

1. all sub-corpora whose name contains the "_in_" string are transcriptions of speech (original speeches to Parliament or oral interpretations), created from scratch from the available video on the site https: // www.;
2. every text is annotated with a variety of metadata concerning the text (date, length in number of words, subject, whether it is a translated text or original language, etc .; and also in the case of oral texts, duration in seconds of the speech, speaking rate, etc.) and the speaker (name, gender, political party, etc.).
3. the segmentation into sentences and alignment at sentence level between the various sub-corpora (eg. Text starting-arrival texts, transcripts of oral-written translations interpretations arrival texts into a target language-texts in a different language, etc.) was created for EPTIC.

Provided by Silvia Bernardini, Adriano Ferraresi, Marie-Aude Lefer, Maja Miličević, Rita Micchi, Manuela Santandrea,
University of Bologna Dept. of Interpreting and Translation