Skip to content

THU-ATOM/AIRFold

Repository files navigation

AIRFold

AIRFold, built on the foundation of AlphaFold2, aims to provide scalable, systematic solutions for the critical issue of protein structure prediction in the field of life sciences. AIRFold's unique Homology Miner module focuses on the mining and extraction of co-evolutionary information, intelligently and automatically extracting, analyzing, and processing the co-evolution information within protein homologous sequences (MSA). In addition, AIRFold offers a systematic structural prediction solution, integrating various leading structural prediction models such as AlphaFold2, RoseTTAFold2, single-sequence structure models like OmegaFold and ESMFold, and ultimately ranking and screening all predicted structures using a model quality estimation (MQE) module. To fully integrate these different modules, we provide a microservices architecture along with user-friendly APIs and a web-based graphical interface, making it convenient for developers and biochemical researchers to use our platform for structural prediction.

Quick Start

AIRFold adopts a microservices architecture and uses Docker to manage all modules and their respective runtime environments, making deployment and startup of AIRFold very simple. Additionally, AIRFold provides an easy-to-use web interface and API, allowing users to submit and manage structure prediction requests conveniently. Follow the steps provided in the documentation below to quickly deploy AIRFold and start your structure prediction journey.

AIRFold Framework AIRFold Web Interface

Installation and running your first prediction

Please follow these steps:

  1. Install Docker.

  2. Clone AIRFold repository:

    git clone https://github.com/health-air/AIRFold
    cd ./AIRFold
  3. Download and prepare the databases, see Databases section.

  4. Launch AIRFold via one line command:

    docker compose up
  5. Interact with AIRFold via Web page or RESTful API:

Note: please change IP address and ports accordingly, they are specified in docker-compose.yml

Databases

AIRFold searches for MSA and templates from various databases. The following lists all the databases used.

Genomics and metagenomics sequence databases

Structure databases

Data structure

├── model_params (models and parameters for AlphaFold2, RoseTTAFold2, ect.)
├── bfd
├── blast_dbs
├── JGIclust
├── metaclust
├── mgnify
├── pdb70
├── pdb_mmcif
├── small_bfd
├── uniclust30
├── uniref30
└── uniref90

Third-party tools used in AIRFold

MSA-based structure prediction

Single sequence-based structure prediction

Multiple sequence alignment generation

Multiple sequence alignment selection

Protein model quality assessment

Quick commands for main functions

In addition to the web interface and API, AIRFold also provides convenient scripts for the following four functions: multiple sequence alignment generation, pretrained embedding generation, protein contact map prediction, and protein structure prediction.

Multiple sequence alignment generation

  • Input: Protein sequences in fasta format.

  • Output: Multiple sequence alignment results in a3m format.

    python run_mode.py --input_path example.fasta --mode msa

Pretrained embedding generation

  • Input: Protein sequences in fasta format.

  • Output: Generated sequence embeddings in pickle format.

    python run_mode.py --input_path example.fasta --mode feature

Protein contact map prediction

  • Input: Protein sequences in fasta format.

  • Output: Generated contact map in pickle format.

    python run_mode.py --input_path example.fasta --mode disgram

Protein structure prediction

  • Input: Protein sequences in fasta format.

  • Output: Protein structure in pdb format.

    python run_mode.py --input_path example.fasta --mode pipline

Citing this work

If you find our open-sourced code & models helpful to your research, please also consider star🌟 and cite📑 this repo. Thank you for your support!

@misc{airfold,
  author={Xin, Hong and Hongliang, Li and Jingjing, Gong and Yuxuan, Song, and Yinjun, Jia and Keyue, Qiu and Han, Tang and Haichuan, Tan and Yanyan, Lan},
  title={AIRFold},
  year={2024},
  howpublished = {\url{https://github.com/health-air/AIRFold}}
}

Please also reference the third-party tools (listed above) you use.

Acknowledgements

We gratefully acknowledge the financial support provided by the National Key R&D Program of China under the grant No.2021YFF1201600. This support has been crucial in enabling our research and development activities.

License and Disclaimer

Copyright 2024 AIR.

AIRFold is extended from AlphaFold, and is licensed under the permissive Apache Licence, Version 2.0.

Contributing

If you encounter problems using AIRFold, feel free to create an issue! We also welcome pull requests from the community.

Contact Information

For help or issues using the repos, please submit a GitHub issue.

For other communications, please contact Yanyan Lan (lanyanyan@air.tsinghua.edu.cn).