CEF Data Marketplace multilingual benchmark for the evaluation of cleaning and clustering tools

CEF-DM Multilingual Benchmark

Five parallel corpora (En-Cs, En-De, En-It, En-Lv, De-It) manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether (i) it is clean - i.e. the translation is correct and fully equivalent to its source text, and (… Read More
People who looked at this resource also viewed the following:
- Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-PL).
- ELRC3.0 Multilingual corpus made out of PDF documents from the European Medicines Agency (EMEA), https://www.ema.europa.eu, (February 2020).
- Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-HU).
- Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-FR).
People who downloaded this resource also downloaded the following: