forked from THUDM/GATNE
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
12 changed files
with
5,537 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
.vscode | ||
data/ | ||
runs/ | ||
src/__pycache__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# GATNE | ||
|
||
### [Project](https://sites.google.com/view/gatne) | [Arxiv](https://arxiv.org/abs/1905.01669) | ||
|
||
Representation Learning for Attributed Multiplex Heterogeneous Network. | ||
|
||
[Yukuo Cen](https://sites.google.com/view/yukuocen), Xu Zou, Jianwei Zhang, [Hongxia Yang](https://sites.google.com/site/hystatistics/home), [Jingren Zhou](http://www.cs.columbia.edu/~jrzhou/), [Jie Tang](http://keg.cs.tsinghua.edu.cn/jietang/) | ||
|
||
Accepted to KDD 2019 Research Track! | ||
|
||
## Prerequisites | ||
|
||
- Linux or macOS | ||
- Python 3 | ||
- TensorFlow >= 1.8 | ||
- NVIDIA GPU + CUDA cuDNN | ||
|
||
## Getting Started | ||
|
||
### Installation | ||
|
||
Clone this repo. | ||
|
||
```bash | ||
git clone https://github.com/THUDM/GATNE | ||
cd GATNE | ||
``` | ||
|
||
Please install dependencies by | ||
|
||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Dataset | ||
|
||
These datasets are sampled from the original datasets. | ||
|
||
- Amazon contains 10,166 nodes and 148,865 edges. [Source](http://jmcauley.ucsd.edu/data/amazon) | ||
- Twitter contains 10,000 nodes and 331,899 edges. [Source](https://snap.stanford.edu/data/higgs-twitter.html) | ||
- YouTube contains 2,000 nodes and 1,310,617 edges. [Source](http://socialcomputing.asu.edu/datasets/YouTube) | ||
- Alibaba contains 6,163 nodes and 17,865 edges. | ||
|
||
You can download the preprocessed datasets by running `python scripts/download_preprocessed_data.py`. (Alibaba dataset is to be released.) | ||
If you're in regions where Dropbox are blocked (e.g. Mainland China), try `python scripts/download_preprocessed_data.py --cn`. | ||
|
||
### Training | ||
|
||
#### Training on the existing datasets | ||
|
||
You can use `./scripts/run_example.sh` or `python src/main.py --input example_data` to train GATNE-T model on the example data. (If you share the server with others or you want to use the specific GPU(s), you may need to set `CUDA_VISIBLE_DEVICES`.) | ||
|
||
If you want to train on the Amazon dataset, you can run `python src/main.py --input data/amazon` or `python src/main.py --input data/amazon --features data/feature.txt` to train GATNE-T model or GATNE-I model, respectively. | ||
|
||
You can use the following commands to train GATNE-T on Twitter and YouTube datasets. We only evaluate the edges of the first edge type on Twitter dataset as the number of edges of other edge types is too small. | ||
`python src/main.py --input data/twitter --eval-type 1` | ||
`python src/main.py --input data/youtube` | ||
|
||
As Twitter and YouTube datasets do not have node attributes, you can generate heuristic features for them, such as DeepWalk embeddings. Then you can train GATNE-I model on these two datasets by adding the `--features` argument. | ||
|
||
#### Training on your own datasets | ||
|
||
If you want to train GATNE-T/I on your own dataset, you should prepare the following three(or four) files: | ||
- train.txt: Each line represents an edge, which contains three tokens `<edge_type> <node1> <node2>` where each token can be either a number or a string. | ||
- valid.txt: Each line represents an edge or a non-edge, which contains four tokens `<edge_type> <node1> <node2> <label>`, where `<label>` is either 1 or 0 denoting an edge or a non-edge | ||
- test.txt: the same format with valid.txt | ||
- feature.txt (optional): First line contains two number `<num> <dim>` representing the number of nodes and the feature dimension size. From the second line, each line describes the features of a node, i.e., `<node> <f_1> <f_2> ... <f_dim>`. | ||
|
||
If your dataset contains several node types and you want to use meta-path based random walk, you should also provide an additional file as follows: | ||
- node_type.txt: Each line contains two tokens `<node> <node_type>`, where `<node_type>` should be consistent with the meta-path schema in the training command, i.e., `--schema node_type_1-node_type_2-...-node_type_k-node_type_1`. (Note that the first node type in the schema should equals to the last node type.) | ||
|
||
|
||
If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours. | ||
|
||
## Cite | ||
|
||
Please cite our paper if you find this code useful for your research: | ||
|
||
``` | ||
@article{cen2019representation, | ||
title={Representation Learning for Attributed Multiplex Heterogeneous Network}, | ||
author={Cen, Yukuo and Zou, Xu and Zhang, Jianwei and Yang, Hongxia and Zhou, Jingren and Tang, Jie}, | ||
journal={arXiv preprint arXiv:1905.01669}, | ||
year={2019} | ||
} | ||
``` |
Oops, something went wrong.