This repository contains the official implementation of our paper titled SAINT-Angle: self-attention augmented inception-inside-inception network and transfer learning improve protein backbone torsion angle prediction.
Protein structure provides valuable insights into how proteins function within living organisms as well as how proteins interact with one another. Prediction of protein backbone torsion angles (φ and ψ), therefore, is a key subproblem in protein structure prediction. However, existing experimental methods for determining backbone torsion angles are costly and time-consuming. Hence, the concerned community is focusing on developing computational methods for predicting torsion angles.
In this paper, we present SAINT-Angle, a highly accurate method for protein backbone torsion angles prediction. SAINT-Angle uses a self-attention based deep learning architecture called SAINT which was previously developed in our lab for the protein secondary structure prediction. We extended and improved the existing SAINT architecture as well as used transfer learning for predicting the backbone torsion angles. We conducted a thorough analysis and compared the performance of SAINT-Angle with contemporary state-of-the-art prediction methods on a collection of publicly available benchmark datasets, namely, TEST2016, TEST2018, CAMEO, and CASP. The experimental results suggest the notable improvements that our proposed method has achieved over the best alternate methods.
In order to run SAINT-Angle, create a workspace directory. Download the inference.py
script and place it inside the workspace directory.
You may create a separate Conda environment for running SAINT-Angle and install TensorFlow (2.6.0 version) and Keras (2.6.0 version). We have tested our code with the aforementioned versions of TensorFlow and Keras. For faster inference, we recommend install the GPU version of TensorFlow.
You will find here all the pretrained model weights essential for running SAINT-Angle. Download these model weights and place them in a folder named models
inside the workspace directory. There are, in total, 12 model weights files (with .h5
extension). The details on each model are given as follows.
Model Name | Architecture | Features | Model Name | Architecture | Features | |
---|---|---|---|---|---|---|
model_1 | Basic | Base | model_7 | Residual | Base, ProtTrans | |
model_2 | Basic | Base, Win10 | model_8 | Residual | Base, Win10, ProtTrans | |
model_3 | ProtTrans | Base, ProtTrans | model_9 | Basic | ESIDEN | |
model_4 | ProtTrans | Base, Win10, ProtTrans | model_10 | Basic | ESIDEN, HMM | |
model_5 | ProtTrans | Base, Win20, ProtTrans | model_11 | ProtTrans | ESIDEN, HMM, ProtTrans | |
model_6 | ProtTrans | Base, Win50, ProtTrans | model_12 | Residual | ESIDEN, HMM, ProtTrans |
Here, Base means the feature set consisting of PSSM, HMM, and PCP features from SPOT-1D. Win10, Win20, and Win50 mean the window features from SPOT-Contact. ProtTrans means the extracted features from ProtTrans [Github]. ESIDEN means the feature set consisting of PSSM, AA, PCP, DC, RE, PSSP, and RBP features from ESIDEN.
In order to predict backbone torsion angles using SAINT-Angle, you may use any of the given models. Also, there are 2 ensembles of different combinations of given models available in SAINT-Angle for prediction purpose. The details on each ensemble are given as follows.
Ensemble Name | Base Models Used |
---|---|
ensemble_3 | model_10, model_11, model_12 |
ensemble_8 | model_1, model_2, model_3, model_4, model_5, model_6, model_7, model_8 |
By providing the appropriate command line argument (discussed in the subsequent section), you may use the aforementioned (ensemble of) models for torsion angles prediction.
-
Open up a terminal window and navigate to your workspace directory using
cd
command (if necessary). -
To run SAINT-Angle for predicting protein backbone torsion angles, type in the terminal the following command.
python inference.py
This command runs the
inference.py
with default arguments usingmodels/model_1.h5
as model anddatasets/TEST2016_A
as dataset. -
You may change the model that you will use for the inference by providing the model parameter.
-m MODEL_NAME
or--model MODEL_NAME
sets the inference script to use eithermodels/MODEL_NAME.h5
model (in case of inference with single model) orMODEL_NAME
ensemble model (in case of inference with ensemble of models) for the prediction. You must use one of the followingMODEL_NAME
to specify the model that you want use for the inference.model_1 model_2 model_3 model_4 model_5 model_6 model_7 model_8 model_9 model_10 model_11 model_12 ensemble_3 ensemble_8 -
You may change the dataset that you will use for the inference by providing the dataset parameter.
-d DATASET_NAME
or--dataset DATASET_NAME
sets the inference script to use features from thedatasets/DATASET_NAME
dataset to predict the backbone torsion angles of protein(s) belonging toDATASET_NAME
dataset. -
--verbose
sets the inference script to print out detailed messages and--output
sets the inference script to output predicted angles inpredictions
folder inside workspace directory (discussed in the subsequent section). -
-h
or--help
sets the inference script to display help message.