Presenter
Thomas Kerboul (@Thomas_Kerboul)
Slides and Recordings
- Slides: 20251119_SWIB25_Thomas-Kerboul.pdf (308.6 KB)
- Recordings: YouTube
Abstract
Federated queries enable merging data across databases, allowing for the identification of errors and gaps when content overlaps. The Bibliothèque de Genève utilizes SPARQL federated queries between Wikidata and IdRef, a French authority file used for bibliographic cataloging, to enhance records about individuals related to Geneva. Using the IdRef identifier as a common link, several modularized queries were designed, facilitating the discovery of potential improvements.
The process of correcting mismatches, however, was predominantly manual, which was crucial, especially in cases of homonymy. IdRef identifiers, often added to Wikidata through VIAF clusters, might be incorrectly associated with the wrong individuals. Manual curation ensured that errors did not propagate further, particularly across members of a given VIAF cluster, thereby maintaining data integrity. Additionally, the comparison revealed that Wikidata tended to be more accurate and up-to-date than IdRef, showcasing the potential of community-curated databases.
This presentation aims to demonstrate the reliability of community-curated databases and the power of federated queries, particularly through the use of SPARQL, to enhance data accuracy and integration across multiple sources. By sharing these insights, we hope to encourage other institutions to adopt similar methodologies to improve their data management practices.