-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the Reichsanzeiger wiki!
This work was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 422758594 (2019–2024)
Diese Arbeit wurde teilweise gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – 422758594 (2019–2024)
The digital edition of the Reichsanzeiger is prepared by Mannheim University Library. Report any issues here.
Here we inform about changes in the digital presentation.
A new online edition for the years 1871–1945 (Deutsches Reich, Weimarer Republik, NS-Zeit) could be prepared thanks to funding from Deutsche Forschungsgemeinschaft (DFG). It is also available in the German Digital Library Deutsches Zeitungsportal.
The years 1828 and 1829 are now completed by paperscans.
For most issues the date is now shown.
FESS had a configuration value which told it to delete an index older than 300 days. So it did now. That means it is currently building a new index. Until that process is finished, search results from the FESS search will be incomplete.
OCR for Reichsanzeiger was just finished when Ray Smith published new trained OCR models for Tesseract. These new models are very promising, because they improve OCR for Fraktur a lot, although there are also some regressions (missing paragraph character, bugs like ß/B confusion in word list). So as soon as that bugs are fixed, there will be a new round trip of OCR. Compare some new results with the old ones.
OCR for all images is finished! We now have more than 360,000 text files produced by Tesseract.
Since a couple of days there is also a new experimental search index which allows fuzzy searchs tolerating some of the errors made by OCR. It is using the Fess Enterprise Search Server. (2017-08-02: updated URL)
We now have nearly 340.000 scans processed by OCR, and most of them are already in our search index.
Now more than 250,000 scans can be searched locally. The search is based on Xapian Omega.
A week ago we started OCR again with an improved Fraktur model for Tesseract (still based on Tesseract 3.05 technology, so not using LSTM). We use four compute servers with 72 (32+16+16+8) Tesseract processes simultaneously. That increased the number of scans covered by OCR from 54,000 to more than 120,000 up to now, and hopefully we'll have processed all scans in a few weeks.
The search service offered by digi.bib.uni-mannheim.de had to be stopped because both increased usage and more than doubled OCR data required too much server resources for the approximate search.
Problems (bad scans, missing journal issues) and other notes related to the digital presentation are documented here.