GC News: Content Analysis with regard to Populism of German News Media during the Covid-19 pandemic

This repository holds the data, code, and results for a master thesis with the Hochschule für Politik at the Technical University of Munich. It includes all ground truth labels and a fully labeled version of the GC News data. To comply with copyright laws, the journalistic contents were removed from the GC News data and are available for replication upon request. The repository further includes the Python and R code used for model creation, fine-tuning, and evaluation, as well as for regression analysis and data visualization.

Data

A labeled version of the entire corpus of German news paragraphs is included in data/gcnews/gc_news_labeled.csv. To comply with copyright laws, the journalistic contents were removed from this file. With the labels and other meta information, this file can be used to reproduce the regression and data analysis conducted in this study. A second data/gcnews/gc_news_unlabeled_sample1000.csv file contains 1000 random sample paragraphs from the GC News dataset and can be used to understand, track, and reproduce the labeling process.

The data/ground_truth/labeled_data.csv and data/ground_truth/labeled_data_unbalanced.csv files contain a stratified and a balanced version of the ground truth labels that were used to train, fine-tune, and evaluate the five implemented classification methods.

The data/external data includes the daily 7-day incidence and fatality numbers of Covid-19 in Germany, as recorded by the Robert Koch Institute. The data was initially published in

Robert Koch-Institut (2023): 7-Tage-Inzidenz der COVID-19-Fälle in Deutschland, Berlin: Zenodo. DOI: 10.5281/zenodo.10207072
Robert Koch-Institut (2023): COVID-19-Todesfälle in Deutschland, Berlin: Zenodo. DOI: 10.5281/zenodo.10207073

Text Classification

Dictionaries

This publication applies three dictionaries via the multidictR and popdictR packages. Together with Gründl`s dictionary on populism, these were published in

Gründl, J. (2022). Populist ideas on social media: A dictionary-based measurement of populist communication. New Media & Society, 24(6), 1481–1499. https://doi.org/10.1177/1461444820976970

The thiele_covid_terms.csv and thiele_econ_terms.csv dictionaries were originally published @

Thiele, D. (2022). Pandemic Populism? How Covid-19 Triggered Populist Facebook User Comments in Germany and Austria. Politics and Governance, 10(1), 185–196. https://doi.org/10.17645/pag.v10i1.4712

Compressor-based Classification

The code in the text_classification/gzip folder implements a compressor-based k-NN-classifier and relies on Jiang et al.`s npc_gzip library. The Python codebase required for reproduction is available on pypi.org via

pip install npc-gzip

and was published in

Jiang, Z., Yang, M., Tsirlin, M., Tang, R., Dai, Y., & Lin, J. (2023). “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. Findings of the Association for Computational Linguistics: ACL 2023, 6810–6828. https://aclanthology.org/2023.findings-acl.426

SetFit Classifier

The exact model specifications and usage instructions for the SetFit classifier are available at https://huggingface.co/baunef/PopFit. The respective Jupyter notebook text_classification/SetFit/setfit_inference.ipynb downloads the model from HuggingFace and can be used for inference. The text_classification/SetFit/setfit_training.ipynb notebook was deployed via Google Colaboratory and trains a SetFit classifier.

PopBERT

The text_classification/zero_shot/zero_shot.ipynb notebook contains an inference pipeline for BERT language models and is set for testing and inference with the PopBERT classifier (https://huggingface.co/luerhard/PopBERT).

Regression and Visualization

the regression and visualization directories contain R files that produce all figures and results reported in the study. The figures can be viewed in the figures directory.

Citing GC News

If you use parts of this repository in your research, please consider citing

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
figures		figures
regression_analysis		regression_analysis
text_classification		text_classification
visualization		visualization
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GC News: Content Analysis with regard to Populism of German News Media during the Covid-19 pandemic

Data

Text Classification

Dictionaries

Compressor-based Classification

SetFit Classifier

PopBERT

Regression and Visualization

Citing GC News

About

Releases 3

Languages

License

baunef/gcnews

Folders and files

Latest commit

History

Repository files navigation

GC News: Content Analysis with regard to Populism of German News Media during the Covid-19 pandemic

Data

Text Classification

Dictionaries

Compressor-based Classification

SetFit Classifier

PopBERT

Regression and Visualization

Citing GC News

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Languages