An aggregation workflow based on linked data for the common European data space for cultural heritage

nfreire · August 20, 2024, 4:46am

Presenter

Nuno Freire (@nfreire)
Bob Coret (@coret)

Slides and Recordings

Slides
Recordings: TIB AV Portal

Abstract

In the context of the activities for innovating the operating models and aggregation methods in the common European data space for cultural heritage, Europeana and the Dutch Digital Heritage Network cooperated to broaden the solutions for data aggregation in the data space by defining an aggregation method based on linked data.

Although the aggregation workflows currently in practice by both organisations are different, a generalisable method for linked data aggregation was successfully defined. This method builds on top of dataset-level metadata in the Data Catalogue Vocabulary (DCAT) model. These dataset descriptions must follow guidelines to ensure that the information about the dataset’s distributions can be fully understood by machines, allowing automatic harvesting, or downloading, of the datasets.

The aggregation workflow starts when a provider informs Europeana of the URI of a dataset. The URI must either be resolvable to a DCAT description of the dataset, or be queryable with a DESCRIBE SPARQL query on a SPARQL endpoint (also informed by the provider). Next, the location of downloadable distributions of the dataset in EDM are obtained from the metadata and automatically downloaded by Europeana. The dataset distributions do not have to follow the EDM RDF/XML schema (any well known RDF serialisation may be used) because Europeana segments the RDF data into individual EDM records. At this point, the datasets are ready for the normal ingestion process of Europeana.

The method was tested in practice in a pilot where several datasets from the Dutch National Archives of The Netherlands were aggregated into Europeana. The datasets were converted by the Dutch aggregator DC4EU from the National Archives ontology to EDM, and the DCAT metadata was made available in the dataset register of the Dutch Heritage Network.