🔨 Introduction to the Annif automated indexing tool

Facilitator(s)

Osma Suominen (@Osma), Mona Lehtinen (@Mona), Juho Inkinen (@juhoinkinen), Ghulam Mustafa Majal (@Ghulam_Mustafa_Majal), Argie Kasprzik (@AnkasZBW)

Slides

Slides:

Abstract

Numerous libraries and affiliated organizations are investigating automated techniques for generating metadata, and the role of AI is ever increasing. This workshop provides an overview of the open-source multilingual automated indexing tool Annif ([https//annif.org annif.org]), which can be incorporated e.g. into a library’s metadata creation system. Attendees will have the opportunity to work directly with Annif by installing it, training its algorithms using sample data, and producing subject suggestions for new documents. The workshop covers both simple and advanced scenarios. During the workshop we will also cover some aspects of using Large Language Models for metadata extraction.

We provide material for self-study before the workshop, which leaves time to focus more on troubleshooting, inquiries, presentations, and discussions during the workshop. Attendees are able to access the instructional videos and exercises from the [GitHub - NatLibFi/Annif-tutorial: Instructions, exercises and example data sets for Annif hands-on tutorial Annif-tutorial GitHub repository] in advance.

Participants should have a computer with a minimum of 8 GB of RAM and 20 GB of free disk space. The software will be provided as a preconfigured VirtualBox virtual machine, although Docker images and a Linux installation option are also available. No previous experience with Annif is necessary. However, familiarity with subject vocabularies (such as thesauri or classification systems) and corresponding subject metadata is expected.

:information_source: To register your participation in this workshop click on the “Going” button above. You will then receive an email notification as soon as facilitators post an update. Watch out to not register for two parallel workshops.

1 Like

Welcome to the Annif workshop at SWIB25!

Dear participanst,

SWIB is just around the corner, and we would like to give you some general information and more details about the workshop.

The self-study material is already available online at https://github.com/NatLibFi/Annif-tutorial. It consists of exercises that are explained in detail with written instructions and videos that serve as more practical demonstrations. The meaning of the material is to introduce Annif and its use to a wide audience. Please go through the material carefully beforehand and try to complete at least all the core exercises. We will cover and discuss some of the material and exercises during the workshop but please try to get as far along as you can beforehand.

Note that the actual workshop will take place over Zoom; the address will be posted right before the workshop.

Preliminary schedule (UTC)

  • 19:00 Start

    • Practicalities & intro

    • Breakout sessions for exercises

  • 18:00-18:10 PM Break

    • Discussion & more exercises (in the main room)
  • 18:50-19:10 Break

    • Presentations on Annif use

    • Closing words

  • Finish ~20:00 PM

Technical setup and getting started

We recommend a laptop / computer with at least 8GB of RAM and at least 20GB of free disk space. As stated in the material, there are 4 options in order to get Annif running for the tutorial (we generally recommend the first two options):

  1. VirtualBox: Please install the VirtualBox host software (https://www.virtualbox.org/) for your operating system beforehand. Versions 6.0 and 6.1 (or newer) should work. The basic host install is fine; the Extension Pack or SDK modules are not necessary. We provide a 64 bit Xubuntu based VirtualBox image, which means that there should be no need to change BIOS settings to enable virtualization support. If you don’t have administrative rights to your computer, an administrator must install the VirtualBox software for you. Otherwise this is a relatively simple approach.

  2. Codespaces: In this setup the tutorial is completed using GitHub Codespaces. Annif will be running in a GitHub-hosted machine, which you will access via a terminal in your browser. For this setup you are required to have an account on GitHub, but on the other hand admin rights for your computer are not needed.

  3. Docker: For Windows users, make sure to have Docker Desktop Community (https://docs.docker.com/docker-for-windows/install/) installed on your own machine beforehand. This installation will require administrative privileges. For other operating systems, please install the appropriate tools to run Docker images. Also note that for convenience it’s good to be able to share a drive on your computer with Docker (see https://docs.docker.com/docker-for-windows/#shared-drives). We recommend Docker mainly if you have used it before.

  4. Local install: For experienced Linux users only (see https://github.com/NatLibFi/Annif). We can provide some assistance for the installation process if needed but it would be best if you could try performing the install beforehand. Administrative rights may be needed to install some of the required system libraries, although most of the installation is performed as a normal user.

If you are unable to get the tutorial running, or if you have any other problems or questions, do not hesitate to contact us as soon as possible so that we’re able to assist you. To contact us via Discourse, please create a New Topic under the Workshops@SWIB25 category and tag it with the label ws-annif. We will respond to posts directly on Discourse and you will get a notification of any replies.

See you soon!

Here is the link to the Zoom room, see you on Monday! :right_arrow:
https://helsinki.zoom.us/j/64658512838?pwd=uO8RPOavRfAANjqWEFtMAQoZaOfY7Z.1

Hi Mona, will this session be recorded, thanks.

Hi, we will unfortunately not record the session, but all of the material, slides etc., will be made available

Thank you for the workshop! Our use case slides for the National Library of Finland are available through the link. You can leave feedback for us via the feedback form

Thank you very much for participating in this workshop. You can find the Annif use case slides for ZBW attached with this message.

AUTOSE-ZBW-SWIB2025.pdf (834.9 KB)