Improving data quality in Europeana: The Universitätsbibliothek Heidelberg case study

by Pierre-Edouard Barrault

Since the foundation of various libraries close to the university in the 14th and 15th centuries, the Universitätsbibliothek Heidelberg has housed many important collections. The datasets that have been delivered to Europeana hold over 25,000 records including treasures such as manuscripts, manuscript fragments, charters, and early modern and modern printed books and magazines. The subject areas and the source language of the objects vary widely, including, among others, French, German, Italian, and Spanish. All the material is available under a CC BY-SA license, allowing for free re-use.

Following the implementation of the IIIF Framework in Europeana, we worked on the ingestion of material complying to these new standards. As the Universitätsbibliothek Heidelberg had implemented IIIF resources into their metadata, it proved to be a good candidate. We therefore focussed on this institution with the hopes of improving its collections in Europeana thanks to IIIF. The work that was done, which is has been published as a case study, went further as we managed to improve data quality because of the approach taken in this case.

A data quality plan was set up to improve the metadata of the two datasets delivered to Europeana by the Universitätsbibliothek (07931 and 07932). An analysis of the data was made and work undertaken to provide researchers with more meaningful metadata.

A variety of improvements were implemented, some of them include:

  • Standardisation of mandatory or meaningful fields (e.g. date, type, and subject)

  • Populating additional fields (e.g. description, spatial information)

  • Standardisation of creator and contributor fields

  • Incorporation of hierarchical relationships


Source: Graf Konrad von Kirchberg on f. 23r, Große Heidelberger Liederhandschrift (Codex Manesse), 1300-1340, Germany, Universitätsbibliothek Heidelberg, CC BY-SA.

This dataset is of particular interest not only to digital humanists, but also to palaeographers, codicologists, book historians, philologists, medieval or early modern historians, and many other scholars.

The implemented data quality improvements of these datasets are now offering a much wider array of use cases for researchers, would they would want do a quantitative or qualitative research of the objects. Also, by accessing the API, querying and harvesting selected records, it is possible to do a data analysis or visualizations. Furthermore, the exposed IIIF resources make it possible to pick the manifest and load this into a compliant viewer for side-by-side comparison, annotating, transcribing, or reconstructing.

Explore the Universitätsbibliothek Heidelberg's treasures in Europeana Collections!