Skip to content

a score inference server using protein folding models and FastAPI

License

Notifications You must be signed in to change notification settings

Oaklight/protein-score-server

Repository files navigation

Protein Structure Score Server

1. Introduction

This server is a protein structure prediction tool that processes prediction requests from users and capable of returning various scores for protein sequences.

2. Installation

To install the environment, follow these steps:

git clone https://github.com/Oaklight/protein-score-server.git
cd protein-score-server
conda env create -f env/environment.yaml
conda activate esm
pip install -r env/requirements.txt

3. Server Configuration

Configuration File:

  • Copy server.yaml.sample to server.yaml :
cp server.yaml.sample server.yaml
  • Edit server.yaml with your settings.

The server uses the server.yaml file for configuration. Currently configurable items include:

  • api_key: API key for Hugging Face Hub login.
  • history_path: History result storage path.
  • intermediate_pdb_path: Intermediate PDB file storage path.
  • model: Model configuration
    • name: model name, esmfold or protenix (bytedances' alphafold3 implementation)
    • replica: GPU device and replications mapping, should be in <device>: <num_replica> format. For esmfold case, use _: <num_replica> instead.
  • task_queue_size: Task queue size, default to 50.
  • timeout: Timeout for async prediction result retrieval, default to 15 seconds.
  • backbone_pdb:
    • reversed_index: path for reverse index from pdb id to pdb file path
    • parquet_prefix: path prefix for parquet files
    • pdb_prefix: path prefix for pdb files

For example, see server.yaml

After the config are set, run these commands inside the project folder:

conda activate esm
uvicorn main:app --host 0.0.0.0 --port 8000

4. Usage

4.1. Request Prediction

Users can send POST requests to http://your-host:8000/predict/ to get predictions. The request body comprises of these fields: seq , name , type , seq2 .

  • seq: String, representing the protein sequence.
  • name: String, representing the name of the reference protein.
  • type: String, representing the task type, currently supports "plddt", "tmscore", "sc-tmscore", "pdb".
  • seq2: String, representing the sequence of the reference protein. Used only for sc-tmscore task. You may choose to provide either seq2 or name
  1. pLDDT
{
    "seq": "MKRESHKHAEQARRNRLAVALHELASLIPAEWKQQNVSAAPSKATTVEAACRYIRHLQQNGST",
    "type": "plddt"
}
  1. TMscore
{
    "seq": "MKRESHKHAEQARRNRLAVALHELASLIPAEWKQQNVSAAPSKATTVEAACRYIRHLQQNGST",
    "name": "1a0a.A", # must provide for tasks that require a reference structure
    "type": "tmscore"
}
  1. sc-TMscore
{
    "seq": "MKRESHKHAEQARRNRLAVALHELASLIPAEWKQQNVSAAPSKATTVEAACRYIRHLQQNGST",
    "seq2": "MKRESHKHAEQARRNRLAVALHELASLIPAEWKQQNVSAAPSKATTVEAACRYIRHLQQNGST", # choose to provide either seq2 or name
    "type": "sc-tmscore"
}

or

{
    "seq": "MKRESHKHAEQARRNRLAVALHELASLIPAEWKQQNVSAAPSKATTVEAACRYIRHLQQNGST",
    "name": "1a0a.A", # choose to provide either seq2 or name
    "type": "sc-tmscore"
}
  1. pdb
{
    "seq": "MKRESHKHAEQARRNRLAVALHELASLIPAEWKQQNVSAAPSKATTVEAACRYIRHLQQNGST",
    "type": "pdb"
}

The server will return a JSON response containing two fields: job_id and prediction .

  • job_id: String, representing the task ID.
  • prediction: String, currently only indicating the prediction is in processing.
{
    "job_id": "0a98a981748c4b7eacfd5e0957905ced", # this is a uuid4 hex string
    "prediction": ... # not very useful at this moment
}

4.2. Result Retrieval

Users can send GET requests to http://your-host:8000/result/{job_id} to get prediction results. The header of the request should contain Content-Type: application/json .

The server will return a JSON response containing two fields: job_id and prediction .

{
    "job_id": "0a98a981748c4b7eacfd5e0957905ced", # this is a uuid4 hex string
    "prediction": 0.983124
}

4.3. Error Handling

  • 202: Job is being processed. Please wait.
  • 400: Task input information error. Check detailed messages.
  • 404: Task ID does not exist in server records.
  • 429: Server job queue is currently full. Please wait.

4.4. Retry Strategy

  • Recommend to use an exponential backoff strategy with a base of 3 when querying for results.
  • Example of querying is available in test.py.

5. Server Shutdown

To stop the server, use Ctrl+C in the terminal where the server is running.

6. License

This server is licensed under the Apache License 2.0.