PyKEEN

PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information).

Installation • Quickstart • Datasets • Models • Support • Citation

Installation

The latest stable version of PyKEEN can be downloaded and installed from PyPI with:

$ pip install pykeen

The latest version of PyKEEN can be installed directly from the source on GitHub with:

pip install git+https://github.com/pykeen/pykeen.git

More information about installation (e.g., development mode, Windows installation, extras) can be found in the installation documentation.

Quickstart

This example shows how to train a model on a dataset and test on another dataset.

The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.

from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE',
    dataset='nations',
)

The results are returned in an instance of the PipelineResult dataclass that has attributes for the trained model, the training loop, the evaluation, and more. See the tutorials on understanding the evaluation and making novel link predictions.

PyKEEN is extensible such that:

Each model has the same API, so anything from pykeen.models can be dropped in
Each training loop has the same API, so pykeen.training.LCWATrainingLoop can be dropped in
Triples factories can be generated by the user with from pykeen.triples.TriplesFactory

The full documentation can be found at https://pykeen.readthedocs.io.

Implementation

Below are the models, datasets, training modes, evaluators, and metrics implemented in pykeen.

Datasets (21)

Name	Reference	Description
ckg	`pykeen.datasets.CKG`	The Clinical Knowledge Graph (CKG) dataset from [santos2020]_.
codexlarge	`pykeen.datasets.CoDExLarge`	The CoDEx large dataset.
codexmedium	`pykeen.datasets.CoDExMedium`	The CoDEx medium dataset.
codexsmall	`pykeen.datasets.CoDExSmall`	The CoDEx small dataset.
conceptnet	`pykeen.datasets.ConceptNet`	The ConceptNet dataset from [speer2017]_.
drkg	`pykeen.datasets.DRKG`	The DRKG dataset.
fb15k	`pykeen.datasets.FB15k`	The FB15k dataset.
fb15k237	`pykeen.datasets.FB15k237`	The FB15k-237 dataset.
hetionet	`pykeen.datasets.Hetionet`	The Hetionet dataset is a large biological network.
kinships	`pykeen.datasets.Kinships`	The Kinships dataset.
nations	`pykeen.datasets.Nations`	The Nations dataset.
ogbbiokg	`pykeen.datasets.OGBBioKG`	The OGB BioKG dataset.
ogbwikikg	`pykeen.datasets.OGBWikiKG`	The OGB WikiKG dataset.
openbiolink	`pykeen.datasets.OpenBioLink`	The OpenBioLink dataset.
openbiolinkf1	`pykeen.datasets.OpenBioLinkF1`	The PyKEEN First Filtered OpenBioLink 2020 Dataset.
openbiolinkf2	`pykeen.datasets.OpenBioLinkF2`	The PyKEEN Second Filtered OpenBioLink 2020 Dataset.
openbiolinklq	`pykeen.datasets.OpenBioLinkLQ`	The low-quality variant of the OpenBioLink dataset.
umls	`pykeen.datasets.UMLS`	The UMLS dataset.
wn18	`pykeen.datasets.WN18`	The WN18 dataset.
wn18rr	`pykeen.datasets.WN18RR`	The WN18-RR dataset.
yago310	`pykeen.datasets.YAGO310`	The YAGO3-10 dataset is a subset of YAGO3 that only contains entities with at least 10 relations.

Models (23)

Name	Reference	Citation
ComplEx	`pykeen.models.ComplEx`	Trouillon et al., 2016
ComplExLiteral	`pykeen.models.ComplExLiteral`	Agustinus et al., 2018
ConvE	`pykeen.models.ConvE`	Dettmers et al., 2018
ConvKB	`pykeen.models.ConvKB`	Nguyen et al., 2018
DistMult	`pykeen.models.DistMult`	Yang et al., 2014
DistMultLiteral	`pykeen.models.DistMultLiteral`	Agustinus et al., 2018
ERMLP	`pykeen.models.ERMLP`	Dong et al., 2014
ERMLPE	`pykeen.models.ERMLPE`	Sharifzadeh et al., 2019
HolE	`pykeen.models.HolE`	Nickel et al., 2016
KG2E	`pykeen.models.KG2E`	He et al., 2015
NTN	`pykeen.models.NTN`	Socher et al., 2013
ProjE	`pykeen.models.ProjE`	Shi et al., 2017
RESCAL	`pykeen.models.RESCAL`	Nickel et al., 2011
RGCN	`pykeen.models.RGCN`	Schlichtkrull et al., 2018
RotatE	`pykeen.models.RotatE`	Sun et al., 2019
SimplE	`pykeen.models.SimplE`	Kazemi et al., 2018
StructuredEmbedding	`pykeen.models.StructuredEmbedding`	Bordes et al., 2011
TransD	`pykeen.models.TransD`	Ji et al., 2015
TransE	`pykeen.models.TransE`	Bordes et al., 2013
TransH	`pykeen.models.TransH`	Wang et al., 2014
TransR	`pykeen.models.TransR`	Lin et al., 2015
TuckER	`pykeen.models.TuckER`	Balazevic et al., 2019
UnstructuredModel	`pykeen.models.UnstructuredModel`	Bordes et al., 2014

Losses (7)

Name	Reference	Description
bceaftersigmoid	`pykeen.losses.BCEAfterSigmoidLoss`	A loss function which uses the numerically unstable version of explicit Sigmoid + BCE.
bcewithlogits	`pykeen.losses.BCEWithLogitsLoss`	A wrapper around the numeric stable version of the PyTorch binary cross entropy loss.
crossentropy	`pykeen.losses.CrossEntropyLoss`	Evaluate cross entropy after softmax output.
marginranking	`pykeen.losses.MarginRankingLoss`	A wrapper around the PyTorch margin ranking loss.
mse	`pykeen.losses.MSELoss`	A wrapper around the PyTorch mean square error loss.
nssa	`pykeen.losses.NSSALoss`	An implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_.
softplus	`pykeen.losses.SoftplusLoss`	A loss function for the softplus.

Regularizers (5)

Name	Reference	Description
combined	`pykeen.regularizers.CombinedRegularizer`	A convex combination of regularizers.
lp	`pykeen.regularizers.LpRegularizer`	A simple L_p norm based regularizer.
no	`pykeen.regularizers.NoRegularizer`	A regularizer which does not perform any regularization.
powersum	`pykeen.regularizers.PowerSumRegularizer`	A simple x^p based regularizer.
transh	`pykeen.regularizers.TransHRegularizer`	A regularizer for the soft constraints in TransH.

Optimizers (6)

Name	Reference	Description
adadelta	`torch.optim.Adadelta`	Implements Adadelta algorithm.
adagrad	`torch.optim.Adagrad`	Implements Adagrad algorithm.
adam	`torch.optim.Adam`	Implements Adam algorithm.
adamax	`torch.optim.Adamax`	Implements Adamax algorithm (a variant of Adam based on infinity norm).
adamw	`torch.optim.AdamW`	Implements AdamW algorithm.
sgd	`torch.optim.SGD`	Implements stochastic gradient descent (optionally with momentum).

Training Loops (2)

Name	Reference	Description
lcwa	`pykeen.training.LCWATrainingLoop`	A training loop that uses the local closed world assumption training approach.
slcwa	`pykeen.training.SLCWATrainingLoop`	A training loop that uses the stochastic local closed world assumption training approach.

Negative Samplers (2)

Name	Reference	Description
basic	`pykeen.sampling.BasicNegativeSampler`	A basic negative sampler.
bernoulli	`pykeen.sampling.BernoulliNegativeSampler`	An implementation of the Bernoulli negative sampling approach proposed by [wang2014]_.

Stoppers (2)

Name	Reference	Description
early	`pykeen.stoppers.EarlyStopper`	A harness for early stopping.
nop	`pykeen.stoppers.NopStopper`	A stopper that does nothing.

Evaluators (2)

Name	Reference	Description
rankbased	`pykeen.evaluation.RankBasedEvaluator`	A rank-based evaluator for KGE models.
sklearn	`pykeen.evaluation.SklearnEvaluator`	An evaluator that uses a Scikit-learn metric.

Metrics (6)

Metric	Description	Evaluator	Reference
Adjusted Mean Rank	The mean over all chance-adjusted ranks: mean_i (2r_i / (num_entities+1)). Lower is better.	rankbased	`pykeen.evaluation.RankBasedMetricResults`
Average Precision Score	The area under the precision-recall curve, between [0.0, 1.0]. Higher is better.	sklearn	`pykeen.evaluation.SklearnMetricResults`
Hits At K	The hits at k for different values of k, i.e. the relative frequency of ranks not larger than k. Higher is better.	rankbased	`pykeen.evaluation.RankBasedMetricResults`
Mean Rank	The mean over all ranks: mean_i r_i. Lower is better.	rankbased	`pykeen.evaluation.RankBasedMetricResults`
Mean Reciprocal Rank	The mean over all reciprocal ranks: mean_i (1/r_i). Higher is better.	rankbased	`pykeen.evaluation.RankBasedMetricResults`
Roc Auc Score	The area under the ROC curve between [0.0, 1.0]. Higher is better.	sklearn	`pykeen.evaluation.SklearnMetricResults`

Trackers (3)

Name	Reference	Description
mlflow	`pykeen.trackers.MLFlowResultTracker`	A tracker for MLflow.
neptune	`pykeen.trackers.NeptuneResultTracker`	A tracker for Neptune.ai.
wandb	`pykeen.trackers.WANDBResultTracker`	A tracker for Weights and Biases.

Hyper-parameter Optimization

Samplers (3)

Name	Reference	Description
grid	`optuna.samplers.GridSampler`	Sampler using grid search.
random	`optuna.samplers.RandomSampler`	Sampler using random sampling.
tpe	`optuna.samplers.TPESampler`	Sampler using TPE (Tree-structured Parzen Estimator) algorithm.

Any sampler class extending the optuna.samplers.BaseSampler, such as their sampler implementing the CMA-ES algorithm, can also be used.

Experimentation

Reproduction

PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:

pykeen experiments reproduce tucker balazevic2019 fb15k

Where the three arguments are the model name, the reference, and the dataset. The output directory can be optionally set with -d.

Ablation

PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:

pykeen experiments ablation ~/path/to/config.json

Large-scale Reproducibility and Benchmarking Study

We used PyKEEN to perform a large-scale reproducibility and benchmarking study which are described in our article:

@article{ali2020benchmarking,
  title={Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework},
  author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Galkin, Mikhail and Sharifzadeh, Sahand and Fischer, Asja and Tresp, Volker and Lehmann, Jens},
  journal={arXiv preprint arXiv:2006.13365},
  year={2020}
}

We have made all code, experimental configurations, results, and analyses that lead to our interpretations available at https://github.com/pykeen/benchmarking.

Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

Acknowledgements

Supporters

This project has been supported by several organizations (in alphabetical order):

Logo

The PyKEEN logo was designed by Carina Steinborn.

Citation

If you have found PyKEEN useful in your work, please consider citing our article:

@article{ali2020pykeen,
  title={PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Emebddings},
  author={Ali, Mehdi and Berrendorf, Max and Hoyt, Charles Tapley and Vermue, Laurent and Sharifzadeh, Sahand and Tresp, Volker and Lehmann, Jens},
  journal={arXiv preprint arXiv:2007.14175},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2,259 Commits
.github		.github
benchmarking		benchmarking
docs		docs
notebooks		notebooks
src/pykeen		src/pykeen
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
AUTHORS.md		AUTHORS.md
CHANGELOG.rst		CHANGELOG.rst
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyKEEN

Installation

Quickstart

Implementation

Datasets (21)

Models (23)

Losses (7)

Regularizers (5)

Optimizers (6)

Training Loops (2)

Negative Samplers (2)

Stoppers (2)

Evaluators (2)

Metrics (6)

Trackers (3)

Hyper-parameter Optimization

Samplers (3)

Experimentation

Reproduction

Ablation

Large-scale Reproducibility and Benchmarking Study

Contributing

Acknowledgements

Supporters

Logo

Citation

About

Releases

Packages

Languages

License

gaybro8777/pykeen

Folders and files

Latest commit

History

Repository files navigation

PyKEEN

Installation

Quickstart

Implementation

Datasets (21)

Models (23)

Losses (7)

Regularizers (5)

Optimizers (6)

Training Loops (2)

Negative Samplers (2)

Stoppers (2)

Evaluators (2)

Metrics (6)

Trackers (3)

Hyper-parameter Optimization

Samplers (3)

Experimentation

Reproduction

Ablation

Large-scale Reproducibility and Benchmarking Study

Contributing

Acknowledgements

Supporters

Logo

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages