Presenter
Kolja Bailly (@baillyk)
Slides and Recordings
- Slides: 2025_11_19 SWIB25-Wikibase4research RAG.pdf (1.4 MB)
- Recordings: YouTube
Abstract
The Open Science Lab (OSL) at TIB Hannover develops open source solutions for the management of research data with Wikibase, an extension of the MediaWiki software suite. This presentation shows the integration of AI-based approaches within MediaWiki, utilizing Retrieval-Augmented Generation (RAG), a methodology that allows Large Language Models (LLMs) to interact with custom data sources. The llama_index_mediawiki-service is a containerized solution, based on the LlamaIndex framework, designed to run a LLM that enhances the usability and accessibility of data hosted on MediaWiki instances. Computational resources can be used from remote services such as Huggingface API or GWDG SAIA or locally, preserving user privacy by keeping all data local. The results provide context-aware responses to user queries in natural language or support the user in the creation of SPARQL queries. OSL has updated the service to index data saved in several structured formats including MediaWiki pages and Wikibase statements. By leveraging LlamaIndex, a vector index can be created that stores data from the Wiki instance in a format that allows comparison of semantic similarity. A demo instance of the service has been applied to a Wiki instance containing data about historic manor houses in the Baltic Sea Region, a joint project between OSL and the University of Greifswald. While still in development, this demo offers a promising step towards easy-to-use and free-of-charge open-source LLM integration in MediaWiki.