PaWaC - Public Administration Web as Corpus (Processed)

The corpus PaWaC was designed and developed by the University of Pisa within the Tuscan regional project named SEMPLICE (SEMantic Instruments for PubLIc Administrators and CitizEns) involving several local SMEs and the University of Pisa (http://www.progettosemplice.it/).

Originally The corpus PaWaC was designed and developed by the University of Pisa within the Tuscan regional project named SEMPLICE (SEMantic Instruments for PubLIc Administrators and CitizEns) involving several local SMEs and the University of Pisa (http://www.progettosemplice.it/). It is composed by 4172 documents. The corpus gathers a wide typology of administrative acts (resolutions, circular letters, etc.) representative of the Public Administration Italian language and is freely available for research purposes.

After processing 396,563 high quality segments are remaining.

Resp: CoLing Lab - Laboratorio di Linguistica Computazionale - http://colinglab.humnet.unipi.it/</resp>;
The corpus was collected by crawling the web sites of 277 Tuscan municipalities.

DSI Relevance: eJustice