GitHub - yuyangw/MolCLR: Implementation of MolCLR: "Molecular Contrastive Learning of Representations via Graph Neural Networks" in PyG.

Molecular Contrastive Learning of Representations via Graph Neural Networks

Nature Machine Intelligence [Paper] [arXiv] [PDF]

Yuyang Wang, Jianren Wang, Zhonglin Cao, Amir Barati Farimani
Carnegie Mellon University

This is the official implementation of MolCLR: "Molecular Contrastive Learning of Representations via Graph Neural Networks". In this work, we introduce a contrastive learning framework for molecular representation learning on large unlabelled dataset (~10M unique molecules). MolCLR pre-training greatly boosts the performance of GNN models on various downstream molecular property prediction benchmarks. If you find our work useful in your research, please cite:

@article{wang2022molclr,
  title={Molecular contrastive learning of representations via graph neural networks},
  author={Wang, Yuyang and Wang, Jianren and Cao, Zhonglin and Barati Farimani, Amir},
  journal={Nature Machine Intelligence},
  pages={1--9},
  year={2022},
  publisher={Nature Publishing Group},
  doi={10.1038/s42256-022-00447-x}
}

Getting Started

Installation

Set up conda environment and clone the github repo

# create a new environment
$ conda create --name molclr python=3.7
$ conda activate molclr

# install requirements
$ pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install torch-geometric==1.6.3 torch-sparse==0.6.9 torch-scatter==2.0.6 -f https://pytorch-geometric.com/whl/torch-1.7.0+cu110.html
$ pip install PyYAML
$ conda install -c conda-forge rdkit=2020.09.1.0
$ conda install -c conda-forge tensorboard
$ conda install -c conda-forge nvidia-apex # optional

# clone the source code of MolCLR
$ git clone https://github.com/yuyangw/MolCLR.git
$ cd MolCLR

Dataset

You can download the pre-training data and benchmarks used in the paper here and extract the zip file under ./data folder. The data for pre-training can be found in pubchem-10m-clean.txt. All the databases for fine-tuning are saved in the folder under the benchmark name. You can also find the benchmarks from MoleculeNet.

Pre-training

To train the MolCLR, where the configurations and detailed explaination for each variable can be found in config.yaml

$ python molclr.py

To monitor the training via tensorboard, run tensorboard --logdir ckpt/{PATH} and click the URL http://127.0.0.1:6006/.

Fine-tuning

To fine-tune the MolCLR pre-trained model on downstream molecular benchmarks, where the configurations and detailed explaination for each variable can be found in config_finetune.yaml

$ python finetune.py

Pre-trained models

We also provide pre-trained GCN and GIN models, which can be found in ckpt/pretrained_gin and ckpt/pretrained_gcn respectively.

Acknowledgement

PyTorch implementation of SimCLR: https://github.com/sthalles/SimCLR
Strategies for Pre-training Graph Neural Networks: https://github.com/snap-stanford/pretrain-gnns

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ckpt		ckpt
dataset		dataset
figs		figs
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
config_finetune.yaml		config_finetune.yaml
finetune.py		finetune.py
molclr.py		molclr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Molecular Contrastive Learning of Representations via Graph Neural Networks

Nature Machine Intelligence [Paper] [arXiv] [PDF]

Getting Started

Installation

Dataset

Pre-training

Fine-tuning

Pre-trained models

Acknowledgement

About

Releases

Packages

Languages

License

yuyangw/MolCLR

Folders and files

Latest commit

History

Repository files navigation

Molecular Contrastive Learning of Representations via Graph Neural Networks

Nature Machine Intelligence [Paper] [arXiv] [PDF]

Getting Started

Installation

Dataset

Pre-training

Fine-tuning

Pre-trained models

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages