data

Data Explanation

For more detailed information, please refer to the DeepDTA article.

Similarity files

For each dataset, there are two similarity files, drug-drug and target-target similarities.

Drug-drug similarities obtained via Pubchem structure clustering.
Target-target similarities are obtained via S-W similarity.

These files were used to re-produce the results of two other methods (Pahikkala et al., 2017) and (He et al., 2017), and also for some experiments in DeepDTA model, please refer to paper.

The original Davis data and more explanation can be found here.
The original KIBA data and more explanation can be found here.

Binding affinity files

For davis dataset, standard value is Kd in nM. In the article, we used the transformation below:

$pK_{d}=-log_{10}\frac{K_d}{1e9}$

For KIBA dataset, standard value is KIBA score. Two versions of the binding affinity value txt files correspond the original values and transformed values (more information here). In the article we used the tranformed form.
nan values indicate there is no experimental value for that drug-target pair.

Train and test folds

There are two files for each dataset: train fold and test fold. Both of these files keep the position information for the binding affinity value given in binding affinity matrices in the text files.

Since we performed 5-fold cv, each fold file contains five different set of positions.
Test set is same for all five training sets.

For using the folds

Load affinity matrix Y

import pickle
import numpy as np

Y = pickle.load(open("Y", "rb"))  # Y = pickle.load(open("Y", "rb"), encoding='latin1')
label_row_inds, label_col_inds = np.where(np.isnan(Y)==False)

label_row_inds: drug indices for the corresponding affinity matrix positions (flattened)
e.g. 36275th point in the KIBA Y matrix indicates the 364th drug (same order in the SMILES file)
```
label_row_inds[36275]
```
label_col_inds: protein indices for the corresponding affinity matrix positions (flattened)

e.g. 36275th point in the KIBA Y matrix indicates the 120th protein (same order in the protein sequence file)
```
label_col_inds[36275]
```

You can then load the fold files as follows:

import json
test_fold = json.load(open(yourdir + "folds/test_fold_setting1.txt"))
train_folds = json.load(open(yourdir + "folds/train_fold_setting1.txt"))

test_drug_indices = label_row_inds[test_fold]
test_protein_indices = label_col_inds[test_fold]

Remember that, train_folds contain an array of 5 lists, each of which correspond to a training set.

Name		Name	Last commit message	Last commit date
parent directory ..
davis		davis
kiba		kiba
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

Data Explanation

Similarity files

Binding affinity files

Train and test folds

For using the folds

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

Data Explanation

Similarity files

Binding affinity files

Train and test folds

For using the folds