Raw audio features from the Europeana Sounds collection

10 March 2017 Comment

The Europeana Sounds project has been working for the last three years to get collections of sound archives from around Europe available online. All the material related to music can be found in Europeana's dedicated thematic collection: Europeana Music. So far over 330,000 pictures, texts, and sound files can be accessed there.

For a user of Europeana Music, it is interesting to be able to search for specific music genres (e.g. free jazz, Irish folk, baroque) to find what they are looking for in this vast amount of material. However, this information is not always available in the data. Currently, only about a fifth of the Europeana Music Collection has been labelled with a unified genre description. And even in those cases the genre classification is often very general, because it has been applied at the collection level instead of being optimised for a specific piece.

To improve the quality of the genre information, we organised a genre detection challenge on the 1st of October 2016 in Vienna. See the blog post on Europeana Pro for the outcomes here.

By using these audio features files you can detect specific characteristics in the sound files that can be used to detect genres by machine learning. A good place to start is the example code below.


Example Code

Jupyter / IPython Notebook (0.1MB)
Jupyter / IPython Notebook (HTML converted) (0.3MB)


metadata.csv (18MB)
genres.txt (0.1MB) - a list of genre labels from the Europeana Sounds genre taxonomy.

Audio Features

Raw audio features (directory structured; one file per track)

ssd.zip (42MB)
rp.zip (328MB)
tssd.zip (255MB)
rh.zip (19MB)
trh.zip (98MB)
mvd.zip (93MB)

mfcc_aggregated.zip (25MB)
chroma_aggregated.zip (18MB)
rmse_aggregated.zip (10MB)
spectral_bandwidth_aggregated.zip (11MB)
spectral_centroid_aggregated.zip (11MB)
spectral_contrast_aggregated.zip (18MB)
spectral_rolloff_aggregated.zip (11MB)
tonnetz_aggregated.zip (15MB)
zero_crossing_rate_aggregated.zip (11MB)

Cummulated audio features (one csv-file per feature for the entire collection)

ssd.csv.gz (30MB)
rp.csv.gz (256MB)
tssd.csv.gz (208MB)
rh.csv.gz (11MB)
trh.csv.gz (77MB)
mvd.csv.gz (72MB)

mfcc.csv.gz (15MB)
chroma.csv.gz (8MB)
rmse.csv.gz (1MB)
spectral_bandwidth.csv.gz (1MB)
spectral_centroid.csv.gz (1MB)
spectral_contrast.csv.gz (8MB)
spectral_rolloff.csv.gz (11MB)
tonnetz.csv.gz (6MB)
zero_crossing_rate.csv.gz (1MB)

by Joris Pekel, Europeana, and Alexander Schindler, Austrian Institute of Technology

  • Item Type

    Audio, Dataset

  • Language Coverage

    Multiple languages

  • Spatial Coverage

    Multiple Areas

  • Time Coverage

    Multiple Eras