release v0.2.0

--fixup
zhangguichuan · Oct 12, 2019 · abb6036 · abb6036
1 parent e45d0a0
commit abb6036
Show file tree

Hide file tree

Showing 86 changed files with 4,004 additions and 3,295 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,34 @@
+Change log
+==========
+
+Here list all notable changes in GraphVite library.
+
+v0.2.0 - 2019-10-11
+-------------------
+- Add scalable multi-GPU prediction for node embedding and knowledge graph embedding.
+  Evaluation on link prediction is 4.6x faster than v0.1.0.
+- New demo dataset `math` and entity prediction evaluation for knowledge graph.
+- Support Kepler and Turing GPU architectures.
+- Automatically choose the best episode size with regrad to RAM limit.
+- Add template config files for applications.
+- Change the update of global embeddings from average to accumulation. Fix a serious
+  numeric problem in the update.
+- Move file format settings from graph to application. Now one can customize formats
+  and use comments in evaluation files. Add document for data format.
+- Separate GPU implementation into training routines and models. Routines are in
+  `include/instance/gpu/*` and models are in `include/instance/model/*`.
+
+v0.1.0 - 2019-08-05
+-------------------
+- Multi-GPU training of large-scale graph embedding 
+- 3 applications: node embedding, knowledge graph embedding and graph &
+  high-dimensional data visualization
+- Node embedding
+    - Model: DeepWalk, LINE, node2vec
+    - Evaluation: node classification, link prediction
+- Knowledge graph embedding
+    - Model: TransE, DistMult, ComplEx, SimplE, RotatE
+    - Evaluation: link prediction
+- Graph & High-dimensional data visualization
+    - Model: LargeVis
+    - Evaluation: visualization(2D / 3D), animation(3D), hierarchy(2D)
diff --git a/README.md b/README.md
@@ -34,7 +34,7 @@ Here is a summary of the training time of GraphVite along with the best open-sou
 implementations on 3 applications. All the time is reported based on a server with
 24 CPU threads and 4 V100 GPUs.
 
-Node embedding on [Youtube] dataset.
+Training time of node embedding on [Youtube] dataset.
 
 | Model      | Existing Implementation       | GraphVite | Speedup |
 |------------|-------------------------------|-----------|---------|
@@ -50,24 +50,24 @@ Node embedding on [Youtube] dataset.
 [2]: https://github.com/tangjianpku/LINE
 [3]: https://github.com/aditya-grover/node2vec
 
-Knowledge graph embedding on [FB15k] dataset.
+Training / evaluation time of knowledge graph embedding on [FB15k] dataset.
 
-| Model           | Existing Implementation       | GraphVite | Speedup |
-|-----------------|-------------------------------|-----------|---------|
-| [TransE]        | [1.31 hrs (1 GPU)][3]         | 14.8 mins | 5.30x   |
-| [RotatE]        | [3.69 hrs (1 GPU)][4]         | 27.0 mins | 8.22x   |
+| Model           | Existing Implementation           | GraphVite          | Speedup       |
+|-----------------|-----------------------------------|--------------------|---------------|
+| [TransE]        | [1.31 hrs / 1.75 mins (1 GPU)][3] | 13.5 mins / 54.3 s | 5.82x / 1.93x |
+| [RotatE]        | [3.69 hrs / 4.19 mins (1 GPU)][4] | 28.1 mins / 55.8 s | 7.88x / 4.50x |
 
 [FB15k]: http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf
 [TransE]: http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf
 [RotatE]: https://arxiv.org/pdf/1902.10197.pdf
 [3]: https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding
 [4]: https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding
 
-High-dimensional data visualization on [MNIST] dataset.
+Training time of high-dimensional data visualization on [MNIST] dataset.
 
 | Model        | Existing Implementation       | GraphVite | Speedup |
 |--------------|-------------------------------|-----------|---------|
-| [LargeVis]   | [15.3 mins (CPU parallel)][5] | 15.1 s    | 60.8x   |
+| [LargeVis]   | [15.3 mins (CPU parallel)][5] | 13.9 s    | 66.8x   |
 
 [MNIST]: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
 [LargeVis]: https://arxiv.org/pdf/1602.00370.pdf
@@ -85,19 +85,15 @@ Installation
 
 ### From Conda ###
 
-GraphVite can be installed through conda with only one line.
-
 ```bash
-conda install -c milagraph graphvite cudatoolkit=x.x
+conda install -c milagraph graphvite cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+.\d+")
 ```
 
-where `x.x` is your CUDA version, e.g. 9.2 or 10.0.
-
 If you only need embedding training without evaluation, you can use the following
 alternative with minimal dependencies.
 
 ```bash
-conda install -c milagraph graphvite-mini cudatoolkit=x.x
+conda install -c milagraph graphvite-mini cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+.\d+")
 ```
 
 ### From Source ###
@@ -113,6 +109,24 @@ cd build && cmake .. && make && cd -
 cd python && python setup.py install && cd -
 ```
 
+### On Colab ###
+
+```bash
+!wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
+!chmod +x Miniconda3-latest-Linux-x86_64.sh
+!./Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local -f
+
+!conda install -y -c milagraph -c conda-forge graphvite \
+    python=3.6 cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+\.\d+")
+!conda install -y wurlitzer ipykernel
+```
+
+```python
+import site
+site.addsitedir("/usr/local/lib/python3.6/site-packages")
+%reload_ext wurlitzer
+```
+
 Quick Start
 -----------
 
@@ -126,10 +140,14 @@ Typically, the example takes no more than 1 minute. You will obtain some output
 
 ```
 Batch id: 6000
-loss = 0.371641
+loss = 0.371041
+
+------------- link prediction --------------
+AUC: 0.899933
 
-macro-F1@20%: 0.236794
-micro-F1@20%: 0.388110
+----------- node classification ------------
+macro-F1@20%: 0.242114
+micro-F1@20%: 0.391342
 ```
 
 Baseline Benchmark
@@ -139,13 +157,30 @@ To reproduce a baseline benchmark, you only need to specify the keywords of the
 experiment. e.g. model and dataset.
 
 ```bash
-graphvite baseline [keyword ...] [--no-eval] [--gpu n] [--cpu m]
+graphvite baseline [keyword ...] [--no-eval] [--gpu n] [--cpu m] [--epoch e]
 ```
 
 You may also set the number of GPUs and the number of CPUs per GPU.
 
 Use ``graphvite list`` to get a list of available baselines.
 
+Custom Experiment
+-----------------
+
+Create a yaml configuration scaffold for graph, knowledge graph, visualization or
+word graph.
+
+```bash
+graphvite new [application ...] [--file f]
+```
+
+Fill some necessary entries in the configuration following the instructions. You
+can run the configuration by
+
+```bash
+graphvite run [config] [--no-eval] [--gpu n] [--cpu m] [--epoch e]
+```
+
 High-dimensional Data Visualization
 -----------------------------------
 
@@ -156,8 +191,8 @@ GraphVite.
 graphvite visualize [file] [--label label_file] [--save save_file] [--perplexity n] [--3d]
 ```
 
-The file can be either in numpy dump or text format. For the save file, we recommend
-to use a `png` format, while `pdf` is also supported.
+The file can be either a numpy dump `*.npy` or a text matrix `*.txt`. For the save
+file, we recommend to use `png` format, while `pdf` is also supported.
 
 Contributing
 ------------

diff --git a/conda/graphvite-mini/meta.yaml b/conda/graphvite-mini/meta.yaml
@@ -1,6 +1,6 @@
 package:
   name: graphvite-mini
-  version: 0.1.0
+  version: 0.2.0
 
 source:
   path: ../..
@@ -39,6 +39,7 @@ requirements:
     - easydict
     - six
     - future
+    - psutil
 
 build:
   string:

diff --git a/conda/graphvite/meta.yaml b/conda/graphvite/meta.yaml
@@ -1,6 +1,6 @@
 package:
   name: graphvite
-  version: 0.1.0
+  version: 0.2.0
 
 source:
   path: ../..
@@ -40,6 +40,7 @@ requirements:
     - six
     - future
     - imageio
+    - psutil
     - scipy
     - matplotlib
     - pytorch

diff --git a/conda/requirements.txt b/conda/requirements.txt
@@ -17,6 +17,7 @@ conda-forge::easydict
 six
 future
 imageio
+psutil
 scipy
 matplotlib
 pytorch

diff --git a/config/demo/math.yaml b/config/demo/math.yaml
@@ -0,0 +1,40 @@
+application:
+  knowledge graph
+
+resource:
+  gpus: [0]
+  cpu_per_gpu: 8
+  dim: 512
+
+graph:
+  file_name: <math.train>
+
+build:
+  optimizer:
+    type: Adam
+    lr: 5.0e-3
+    weight_decay: 0
+  num_partition: auto
+  num_negative: 8
+  batch_size: 100000
+  episode_size: 100
+
+train:
+  model: RotatE
+  num_epoch: 2000
+  margin: 9
+  sample_batch_size: 2000
+  adversarial_temperature: 2
+  log_frequency: 100
+
+evaluate:
+  task: link prediction
+  file_name: <math.test>
+  filter_files:
+    - <math.train>
+    - <math.valid>
+    - <math.test>
+  target: tail
+
+save:
+  file_name: rotate_math.pkl
diff --git a/config/quick_start.yaml → config/demo/quick_start.yaml b/config/quick_start.yaml → config/demo/quick_start.yaml
@@ -6,6 +6,10 @@ resource:
   cpu_per_gpu: 8
   dim: 128
 
+format:
+  delimiters: " \t\r\n"
+  comment: "#"
+
 graph:
   file_name: <blogcatalog.train>
   as_undirected: true
@@ -30,10 +34,13 @@ train:
   log_frequency: 1000
 
 evaluate:
-  task: node classification
-  file_name: <blogcatalog.label>
-  portions: [0.2]
-  times: 1
+  - task: link prediction
+    file_name: <blogcatalog.test>
+    filter_file: <blogcatalog.train>
+  - task: node classification
+    file_name: <blogcatalog.label>
+    portions: [0.2]
+    times: 1
 
 save:
   file_name: line_blogcatalog.pkl
diff --git a/config/graph/deepwalk_flickr.yaml b/config/graph/deepwalk_flickr.yaml
@@ -7,7 +7,7 @@ resource:
   dim: 128
 
 graph:
-  file_name: <flickr.train>
+  file_name: <flickr.graph>
   as_undirected: true
 
 build:

diff --git a/config/graph/deepwalk_friendster-small.yaml b/config/graph/deepwalk_friendster-small.yaml
@@ -7,7 +7,7 @@ resource:
   dim: 128
 
 graph:
-  file_name: <friendster.small_train>
+  file_name: <friendster.small_graph>
   as_undirected: true
 
 build:

diff --git a/config/graph/deepwalk_friendster.yaml b/config/graph/deepwalk_friendster.yaml
@@ -7,7 +7,7 @@ resource:
   dim: 96
 
 graph:
-  file_name: <friendster.train>
+  file_name: <friendster.graph>
   as_undirected: true
 
 build:

diff --git a/config/graph/deepwalk_youtube.yaml b/config/graph/deepwalk_youtube.yaml
@@ -7,7 +7,7 @@ resource:
   dim: 128
 
 graph:
-  file_name: <youtube.train>
+  file_name: <youtube.graph>
   as_undirected: true
 
 build:

diff --git a/config/graph/line_flickr.yaml b/config/graph/line_flickr.yaml
@@ -7,7 +7,7 @@ resource:
   dim: 128
 
 graph:
-  file_name: <flickr.train>
+  file_name: <flickr.graph>
   as_undirected: true
 
 build:

diff --git a/config/graph/line_friendster-small.yaml b/config/graph/line_friendster-small.yaml
@@ -7,7 +7,7 @@ resource:
   dim: 128
 
 graph:
-  file_name: <friendster.small_train>
+  file_name: <friendster.small_graph>
   as_undirected: true
 
 build:

diff --git a/config/graph/line_friendster.yaml b/config/graph/line_friendster.yaml
@@ -7,7 +7,7 @@ resource:
   dim: 96
 
 graph:
-  file_name: <friendster.train>
+  file_name: <friendster.graph>
   as_undirected: true
 
 build:

diff --git a/config/graph/line_youtube.yaml b/config/graph/line_youtube.yaml
@@ -7,7 +7,7 @@ resource:
   dim: 128
 
 graph:
-  file_name: <youtube.train>
+  file_name: <youtube.graph>
   as_undirected: true
 
 build:

diff --git a/config/graph/node2vec_youtube.yaml b/config/graph/node2vec_youtube.yaml
@@ -7,7 +7,7 @@ resource:
   dim: 128
 
 graph:
-  file_name: <youtube.train>
+  file_name: <youtube.graph>
   as_undirected: true
 
 build:

diff --git a/config/knowledge_graph/complex_fb15k-237.yaml b/config/knowledge_graph/complex_fb15k-237.yaml
@@ -1,5 +1,5 @@
 application:
-  knowledge_graph
+  knowledge graph
 
 resource:
   gpus: []
@@ -12,7 +12,7 @@ graph:
 build:
   optimizer:
     type: Adam
-    lr: 5.0e-4
+    lr: 2.0e-5
     weight_decay: 0
   num_partition: auto
   num_negative: 64