Experimental text dumps from Europeana Newspapers

20 July 2015 Comment

As part of the Europeana Newspapers project, millions of word of public domain text were created via OCRing the historic newspapers that the library partners made available.

After aggregating and text and making it searchable via The European Library, we are now making the raw text available

As an experiment, runs from three newspapers have been made available.

We are interested to know if this is useful way of getting access to the text. What other metadata or information should be presented ? Should the text and folders be organised in different ways ? Please let us know in the comments below, via @eurresearch or email Alastair Dunning


Berliner Tageblatt
Available Years: 1878-1929
Source Library: Staatsbibliothek zu Berlin
Downloadable at : http://data.theeuropeanlibrary.org/download/newspapers/berliner_tageblatt/
Searchable at : http://www.theeuropeanlibrary.org/tel4/newspapers/title/3000096302605
Licence for Text: https://creativecommons.org/publicdomain/mark/1.0/

Jaunākās Ziņas
Available Years: 1911-1936
Source Library: National Library of Latvia
Downloadable at: http://data.theeuropeanlibrary.org/download/newspapers/jaunakas_zinas/
Searchable at : http://www.theeuropeanlibrary.org/tel4/newspapers/title/3000059923367
Licence for Text: https://creativecommons.org/publicdomain/mark/1.0/

L'Univers
Available Years: 1867-1920
Source Library: National Library of France
Downloadable at: http://data.theeuropeanlibrary.org/download/newspapers/l_univers/
Searchable at : http://www.theeuropeanlibrary.org/tel4/newspapers/title/3000113983483
Licence for Text: https://creativecommons.org/publicdomain/mark/1.0/

Additionally the root directory is available here