Skip to content

jvdzwaan/ocrpostcorrection-notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ocrpostcorrection

In 2017 and 2019 a competion on Post-OCR Text Correction was organized. This repository contains the 'working' notebooks for reproducing the best results results of the competition and possibly improving them. The code in the notebooks use functionality from the ocrpostcorrection package.

Install dependencies

git clone https://github.com/jvdzwaan/ocrpostcorrection.git
cd ocrpostcorrection
pip install -e .

How to use

This repository contains two sets of notebooks:

  • local notebooks to be run locally, e.g., for generating datasets
  • colab notebooks to be run on machines with a GPU, e.g., for training neural networks

    ocrpostcorrection
    ├── LICENSE
    ├── README.md
    ├── colab                                      <- Notebooks to be run on GPU
    │   ├── icdar-task1-hf-evaluation.ipynb        <- Evaluate Huggingface BERT model for error detection
    │   ├── icdar-task1-hf-train.ipynb             <- Train Huggingface BERT model for error detection
    │   ├── icdar-task2-seq2seq-evaluation.ipynb   <- Evaluate performance of error correction model
    │   └── icdar-task2-train-seq2seq.ipynb        <- Train error correction model
    └── local                                      <- Notebooks to be run locally
        ├── data                                   <- Data generated and/or used by local notebooks
        ├── evalTool_ICDAR2017.py                  <- ICDAR competition evaluation script
        ├── icdar-create-hf-dataset.ipynb          <- Create Huggingface dataset from the icdar data
        ├── icdar-task2-create-dataset.ipynb       <- Create error correction dataset from the icdar data
        ├── icdar-task2-results-analysis.ipynb     <- Preliminary analysis of error correction results
        └── perfect_task1+2_output_analysis.ipynb  <- Analysis of evalTool script for measuring performance

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published