CEF Data Marketplace multilingual benchmark for the evaluation of cleaning and clustering tools

CEF-DM Multilingual Benchmark

Five parallel corpora (En-Cs, En-De, En-It, En-Lv, De-It) manually annotated by professional translators. Each translation unit (TU) included in the datasets is annotated with information about whether (i) it is clean - i.e. the translation is correct and fully equivalent to its source text, and (Read More