Many libraries and related organizations are exploring automated methods for metadata creation. This workshop offers an introduction to the multilingual automated indexing tool, Annif (annif.org), which can be integrated into a library’s metadata production system. Participants will gain hands-on experience with Annif by setting it up, training its algorithms with sample data, and generating subject suggestions for new documents. The workshop includes both basic and complex scenarios.
Before the event, participants have access to instructional videos and exercises from the Annif-tutorial GitHub repository.The material is for self-study before the workshop, which leaves time to focus more on troubleshooting, inquiries, and discussions during the workshop.
Participants should have a computer with a minimum of 8GB of RAM and 20 GB of free disk space. The software will be provided as a preconfigured VirtualBox virtual machine, though Docker images and a Linux installation option are also available. No previous experience with Annif is necessary. However, familiarity with subject vocabularies (like thesauri or classification systems) and corresponding subject metadata is expected.
To register your participation in this workshop click on the “Going” button above. You will then receive an email notification as soon as facilitators post an update. Watch out to not register for two parallel workshops.
SWIB is just around the corner, and we would like to give you some general information and more details about the workshop.
The self-study material is already available online at https://github.com/NatLibFi/Annif-tutorial. It consists of exercises that are explained in detail with written instructions and videos that serve as more practical demonstrations. The meaning of the material is to introduce Annif and its use to a wide audience. Please go through the material carefully beforehand and try to complete at least all the core exercises. We will cover and discuss some of the material and exercises during the workshop but please try to get as far along as you can beforehand.
How Annif is used at the National Library of Finland
How Annif is used at ZBW
Break 10:15– 10:30
Closing & discussion about next steps and the future of Annif
We will wrap up before 13
Technical setup and getting started
We recommend a laptop / computer with at least 8GB of RAM and at least 20GB of free disk space. As stated in the material, there are 4 options in order to get Annif running for the tutorial (we generally recommend the first two options):
VirtualBox: Please install the VirtualBox host software (https://www.virtualbox.org/) for your operating system beforehand. Versions 6.0 and 6.1 (or newer) should work. The basic host install is fine; the Extension Pack or SDK modules are not necessary. We provide a 64 bit Xubuntu based VirtualBox image, which means that there should be no need to change BIOS settings to enable virtualization support. If you don’t have administrative rights to your computer, an administrator must install the VirtualBox software for you. Otherwise this is a relatively simple approach.
Codespaces: In this setup the tutorial is completed using GitHub Codespaces. Annif will be running in a GitHub-hosted machine, which you will access via a terminal in your browser. For this setup you are required to have an account on GitHub, but on the other hand admin rights for your computer are not needed.
Docker: For Windows users, make sure to have Docker Desktop Community (https://docs.docker.com/docker-for-windows/install/) installed on your own machine beforehand. This installation will require administrative privileges. For other operating systems, please install the appropriate tools to run Docker images. Also note that for convenience it’s good to be able to share a drive on your computer with Docker (see https://docs.docker.com/desktop/). We recommend Docker mainly if you have used it before.
Local install: For experienced Linux users only (see https://github.com/NatLibFi/Annif). We can provide some assistance for the installation process if needed but it would be best if you could try performing the install beforehand. Administrative rights may be needed to install some of the required system libraries, although most of the installation is performed as a normal user.
If you are unable to get the tutorial running, or if you have any other problems or questions, do not hesitate to contact us as soon as possible so that we’re able to assist you. To contact us via Discourse, please create a New Topic under the Workshops@SWIB24 category and tag it with the label ws-annif. We will respond to posts directly on Discourse and you will get a notification of any replies.
The Annif workshop will be in less than a week. I hope you’ve all gotten started on the setup and exercises - and if you have any problems or questions, don’t hesitate to contact us! (See instructions above.)
If all is well, please react to this message with a thumbs-up or similar. That will give us organizers a nice warm feeling
A note to participants: downloading and converting the PDF documents in step 7 of “Exercise 2: Set up and train a TFIDF project” can take a long time so you may want to start it early on.
Thank you for the workshop! You can find the slides on NLF use cases of Annif here, if you wish to have the recording of the last part of the workshop, please contact me.
Thank you to all the participants! I am attaching the use case slides on how ZBW uses Annif in our productive service, with this message. AUTOSE-ZBW-SWIB2024.pdf (1.2 MB)