Skip to content
forked from tblock/10kGNAD

Ten Thousand German News Articles Dataset for Topic Classification

License

Notifications You must be signed in to change notification settings

phonosync/10kGNAD

 
 

Repository files navigation

Ten Thousand German News Articles Dataset

For more information visit the detailed project page.

  1. Install the required python packages pip install -r requirements.txt.
  2. Download the corpus.sqlite3 file into the project root from here (compressed) or directly from here.
  3. Run python code/extract_dataset_from_sqlite.py corpus.sqlite3 articles.csv to extract the articles.
  4. Run python code/split_articles_into_train_test.py to split the dataset.

License

All code in this repository is licensed under a MIT License.

The dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

About

Ten Thousand German News Articles Dataset for Topic Classification

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 87.3%
  • Python 12.7%