Skip to content

Commit

Permalink
💾📉 Improve Memory-Efficiency of Converting OGB Datasets to TriplesFac…
Browse files Browse the repository at this point in the history
…tory (pykeen#1253)

- update loading of OGB datasets to be more memory-efficient
- improve typing in OGB datasets module
- fix OGB BioKG entity count
- misc: enforce encoding when reading/writing README

Fix pykeen#1252
  • Loading branch information
mberr authored Apr 15, 2023
1 parent fdce5b1 commit 3f8d0e2
Show file tree
Hide file tree
Showing 4 changed files with 249 additions and 76 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ have a suggestion for another dataset to include in PyKEEN, please let us know
| Kinships | [`pykeen.datasets.Kinships`](https://pykeen.readthedocs.io/en/latest/api/pykeen.datasets.Kinships.html) | [Kemp *et al*., 2006](https://www.aaai.org/Papers/AAAI/2006/AAAI06-061.pdf) | 104 | 25 | 10686 |
| Nations | [`pykeen.datasets.Nations`](https://pykeen.readthedocs.io/en/latest/api/pykeen.datasets.Nations.html) | [`ZhenfengLei/KGDatasets`](https://github.com/ZhenfengLei/KGDatasets) | 14 | 55 | 1992 |
| NationsL | [`pykeen.datasets.NationsLiteral`](https://pykeen.readthedocs.io/en/latest/api/pykeen.datasets.NationsLiteral.html) | [`pykeen/pykeen`](https://github.com/pykeen/pykeen) | 14 | 55 | 1992 |
| OGB BioKG | [`pykeen.datasets.OGBBioKG`](https://pykeen.readthedocs.io/en/latest/api/pykeen.datasets.OGBBioKG.html) | [Hu *et al*., 2020](https://arxiv.org/abs/2005.00687) | 45085 | 51 | 5088433 |
| OGB BioKG | [`pykeen.datasets.OGBBioKG`](https://pykeen.readthedocs.io/en/latest/api/pykeen.datasets.OGBBioKG.html) | [Hu *et al*., 2020](https://arxiv.org/abs/2005.00687) | 93773 | 51 | 5088434 |
| OGB WikiKG2 | [`pykeen.datasets.OGBWikiKG2`](https://pykeen.readthedocs.io/en/latest/api/pykeen.datasets.OGBWikiKG2.html) | [Hu *et al*., 2020](https://arxiv.org/abs/2005.00687) | 2500604 | 535 | 17137181 |
| OpenBioLink | [`pykeen.datasets.OpenBioLink`](https://pykeen.readthedocs.io/en/latest/api/pykeen.datasets.OpenBioLink.html) | [Breit *et al*., 2020](https://doi.org/10.1093/bioinformatics/btaa274) | 180992 | 28 | 4563407 |
| OpenBioLink LQ | [`pykeen.datasets.OpenBioLinkLQ`](https://pykeen.readthedocs.io/en/latest/api/pykeen.datasets.OpenBioLinkLQ.html) | [Breit *et al*., 2020](https://doi.org/10.1093/bioinformatics/btaa274) | 480876 | 32 | 27320889 |
Expand Down
4 changes: 2 additions & 2 deletions src/pykeen/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -601,7 +601,7 @@ def readme(check: bool):
new_readme = get_readme()

if check:
with open(readme_path) as file:
with open(readme_path, encoding="utf8") as file:
old_readme = file.read()
if new_readme.strip() != old_readme.strip():
click.secho(
Expand All @@ -616,7 +616,7 @@ def readme(check: bool):

sys.exit(-1)

with open(readme_path, "w") as file:
with open(readme_path, "w", encoding="utf8") as file:
print(new_readme, file=file) # noqa:T201


Expand Down
Loading

0 comments on commit 3f8d0e2

Please sign in to comment.