Monolingual segments of Europeana metadata

The resource includes a selection of monolingual metadata sourced from the Europeana platform.
The languages of the data are EN, DE, ES, LV, and NL. The segments have been extracted from different metadata properties of the Europeana Data Model, that captures aspects of a CH item, such as the title of a painting or its description. The textual values have been selected based on the language tags declared in the metadata and have then undergone a segmentation and cleaning process. Metadata values with incorrect language tags have been automatically fixed using a language detector, and then split into sentences with an automatic segmenter. Further filtering has been applied to prune bad quality pairs.

DSI Relevance: Europeana