GitHub - C0nc/SMG: SMG: self-supervised masked graph learning for cancer gene identification.

SMG: self-supervised masked graph learning for cancer gene identification.

Installation

Clone the repository:

git clone https://github.com/C0nc/SMG.git

Navigate to the project directory:
```
cd SMG
```
Install the required dependencies:
```
pip install -r requirements.txt
```
This will install all the necessary packages specified in the requirements.txt file.

Usage

Predefined protein-protein interaction network index:

['CPDB', 'IRefIndex', 'PCNet', 'IRefIndex_2015', 'STRINGdb', 'Multinet']

Run to train the model to predict the gene nodes by the semi-supervised transductive learning:
```
python main_transductive.py [arguments]
```
Provide the required arguments based on your project's needs. Below are the available arguments:
- --ppi: Choose the train protein-protein interaction network based on the defined network index.
- --inductive-ppi: Choose the test protein-protein interaction network based on the defined network index (when inductive learning is needed).
- --expression: Change the task to essential gene prediction.
- --health: Change the task to health gene prediction.

Example output:

Namespace(seeds=[0], device=0, max_epoch=1500, warmup_steps=-1, num_heads=4, num_out_heads=1, num_layers=3, num_hidden=256, residual=True, in_drop=0.4, attn_drop=0.1, norm='layernorm', lr=0.1, weight_decay=0, negative_slope=0.2, activation='relu', mask_rate=0.5, drop_edge_rate=0.2, replace_rate=0.2, encoder='gcn', decoder='gcn', loss_fn='sce', alpha_l=3, optimizer='adam', max_epoch_f=500, lr_f=0.01, weight_decay_f=0.001, linear_prob=False, load_model=False, save_model=False, use_cfg=False, logging=False, scheduler=True, concat_hidden=False, pooling='mean', deg4feat=False, batch_size=32, inductive_ppi=-1, ppi=0, health=False, inducitve=False, essential=False, task='GIN_graph', data_path='', GE=False, IGE=False)
####### Run 0 for seed 0
! Linear Residual !
Identity Residual 
Identity Residual 
sce
2023-06-12 19:08:46,515 - INFO - Use scheduler
2023-06-12 19:08:46,519 - INFO - Start training...
# Epoch 1499: train_loss: 0.1404: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1500/1500 [00:45<00:00, 32.87it/s]
num parameters for finetuning: 199299
tensor(537., device='cuda:0') tensor(1476., device='cuda:0')
# Epoch: 499, train_loss: 0.4631, val_loss: 0.8206, val_auc:0.7747391209580134, test_loss: 0.6556, test_aupr: 0.8171: 100%|██████████| 500/500 [00:07<00:00, 68.73it/s]
--- Testaupr: 0.8171, early-stopping-Testaupr: 0.8162, Best Valaupr: 0.7836 in epoch 452 --- 
# final_aupr.8162±0.0000
# early-stopping_aupr.8162±0.0000

Run to train the model to predict the graph property:

python main_graph.py --use_cfg [arguments]

- `task`: Choose the architecture GIN_graph/GCN_graph

Example Output

2023-06-12 19:34:13,410 - INFO - Using best configs
------ Use best configs ------
Namespace(seeds=[0], device=0, max_epoch=60, warmup_steps=-1, num_heads=2, num_out_heads=1, num_layers=2, num_hidden=512, residual=False, in_drop=0.2, attn_drop=0.1, norm='batchnorm', lr=0.00015, weight_decay=0.0, negative_slope=0.2, activation='relu', mask_rate=0.5, drop_edge_rate=0.0, replace_rate=0.0, encoder='gin', decoder='gin', loss_fn='sce', alpha_l=1, optimizer='adam', max_epoch_f=500, lr_f=0.005, weight_decay_f=0.0, linear_prob=True, load_model=False, save_model=False, use_cfg=True, logging=False, scheduler=False, concat_hidden=False, pooling='mean', deg4feat=False, batch_size=32, inductive_ppi=-1, ppi=0, health=False, inducitve=False, essential=False, task='GIN_graph', data_path='', GE=False, IGE=False)
Graphs class 0: 200, Graphs class 1: 306
Length of balanced dataset list: 400
Train graph class 0: 160, train graph class 1: 160
Validation graph class 0: 40, validation graph class 1: 40
####### Run 0 for seed 0
sce
Epoch 59 | train_loss: 0.1025: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:48<00:00,  1.25it/s]
#Test_f1: 0.8700±0.0510
# final_acc: 0.8700±0.0000

Run to conduct the post-hoc explanation by the GNNExplainer and the Integrated Gradient:

python main_transductive.py [arguments] 

- `GE`: Utilize the GraphExplainer to explain the prediction results.
- `IGE`: Utilize the Integrated Gradient to explain the prediction results.

Reference Website

You can choose the target gene sets enrichment analysis based on the list in this reference website:

https://maayanlab.cloud/Enrichr/
More information about Cancer Gene can be found at:

http://ncg.kcl.ac.uk/

Data available

Get the data from this Google Drive link:

[https://drive.google.com/file/d/10Bs1-TJZS4BFaVLxI1dR7_127Xhp2nEN/view?usp=drive_link]

License

This project is licensed under the MIT License.

Reference

Cui Y et al., SMG: self-supervised masked graph learning for cancer gene identification. Submitted for publication.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
figure		figure
graphmae		graphmae
.gitattributes		.gitattributes
LICENSE.txt		LICENSE.txt
README.md		README.md
configs.yml		configs.yml
main_graph.py		main_graph.py
main_transductive.py		main_transductive.py
requirements.txt		requirements.txt
run.sh		run.sh
rung.sh		rung.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMG: self-supervised masked graph learning for cancer gene identification.

Installation

Usage

Reference Website

Data available

License

Reference

About

Releases

Packages

Languages

License

C0nc/SMG

Folders and files

Latest commit

History

Repository files navigation

SMG: self-supervised masked graph learning for cancer gene identification.

Installation

Usage

Reference Website

Data available

License

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages