Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Feb 1;187(3):526-544.
doi: 10.1016/j.cell.2023.12.028.

De novo protein design-From new structures to programmable functions

Affiliations
Review

De novo protein design-From new structures to programmable functions

Tanja Kortemme. Cell. .

Abstract

Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.

Keywords: artificial intelligence; de novo protein design; deep learning; synthetic signaling.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The author declares no competing interests.

Figures

Figure 1.
Figure 1.. De novo protein design in the age of AI.
(A) Designing proteins de novo (from scratch, without starting from a natural protein) can explore new structures and functions, and design proteins a priori with engineering principles in mind: Proteins could be designed to be tunable in their quantitative properties (rates, affinities, etc.), controllable by arbitrary inputs, and modular such that protein elements can be linked together for diverse input/output behaviors. (B) Advances in AI change the process of de novo protein design. User-defined goals (left) and inputs (middle) are used to generate proteins with new structures and functions (right). Categories 1–4 depict increasingly straightforward prompts leading to increasingly complex design outputs. Blue shading indicates design goals with experimentally validated examples. B-1: AI-based methods to design new protein structures can be unconstrained (generating diverse protein folds; alpha-helices shown in red and beta-strands in yellow) or constrained to diversify a particular fold. B-2: Most current methods to design function specify a “motif” with defined residue positions and orientations in a functional site. In a second step, a protein is generated de novo that surrounds and stabilizes the precise functional site geometry. This process is called “motif scaffolding”. B-3: Advances in AI-based methods are in development that only define the target, and the design method generates a predicted binder. B-4: Starting from a target function (for example converting substrate S to product P), an AI method could generate a protein with the requirements for that function. Currently, protein language models trained on specific protein families or large experimental datasets can generate new sequences with functions similar to those in the training set.
Figure 2.
Figure 2.. Protein design concepts and approaches.
(A) De novo protein design is formulated as an optimization problem: Given a design objective (a protein with a desired shape and function), find one or more amino acid sequences that have that structure and function. Most design methods divide the process into two steps: First, a structure containing only the polypeptide backbone is generated, and then a sequence is designed for that backbone. For each step, design methods that use atomistic modeling (blue) or AI-based approaches (orange) are indicated. (B) Classical design methods use a “blueprint” defining a protein fold topology (identity and order of secondary structure elements) and then assembles a 3-dimensional backbone from ideal helix, strand, and loop peptide fragments. (C) Backbone generation methods can systematically sample geometries (positions, orientations and sizes of secondary structure elements with varied connecting loops) within a given fold. These methods generate synthetic fold families that, just like evolved protein families, can be optimized for diverse functions. (D) A recent AI-based method, protein diffusion, generates protein backbones through a denoising process from random backbone starting coordinates. This method generates diverse protein folds without having to pre-specify a topology as input.
Figure 3.
Figure 3.. De novo design of molecular functions.
(A) General approach to design molecular functions. (B-C) Design of proteins binding to small molecules, using classical design methods (B) that place target binding sites into pre-generated protein scaffolds, or AI-based approaches (C) that generate new protein backbones around a binding site motif or target. (D-F) Design of proteins binding to target proteins (blue shapes). Regions that are optimized by sequence design are shown as dark red shape. (D) Rotamer interaction field approach. Specific interactions with a target protein surface are identified through docking of disembodied side chains, yielding an interaction field into which pre-existing scaffolds are docked and optimized. (E) Fingerprint approach. Interaction sites on the target are identified by predicting interaction fingerprints using the MaSIF deep-learning method, followed by identification of complementary fingerprints from a library of >400 Million patches. Matching patches are then scaffolded into de novo proteins and optimized. (F) Diffusion approach. AI-based protein diffusion is used to generate a binding protein with shape complementarity to a pre-specified hotspot on the target. A second step assigns a sequence to the diffused binder backbone.
Figure 4.
Figure 4.. De novo design to control cellular functions.
(A) Computational design of small-molecule sensors that couple auxin ligand binding to conformational change and fluorescence energy transfer (FRET) (left) or metabolite-induced protein-protein dimerization to split reporter complementation (right). (B) Different quantitative behaviors for CID systems. Top: “ratchet” mechanism, where ligand binding leads to a conformational change in one protein that creates a composite binding interface for the second protein. Bottom: “molecular glue” mechanism where the small molecule can bind either partner. This mechanism can lead to “bandpass filter” behavior where complex formation is low at high ligand concentrations because each of the two protein partners are bound by a different ligand molecule. (C) Mechanism of the de novo designed LOCKR system, where an output element is buried but can be displaced by a competing key element, leading to an output. (D) Application of the Co-LOCKR system to perform logic operations based on the composition of receptors present on the cell surface.

Similar articles

Cited by

References

    1. Regan L, and DeGrado WF (1988). Characterization of a helical protein designed from first principles. Science (New York, N.Y 241, 976–978. 10.1126/science.3043666. - DOI - PubMed
    1. Arnold FH (2019). Innovation by Evolution: Bringing New Chemistry to Life (Nobel Lecture). Angew Chem Int Ed Engl 58, 14420–14426. 10.1002/anie.201907729. - DOI - PubMed
    1. Gordley RM, Bugaj LJ, and Lim WA (2016). Modular engineering of cellular signaling proteins and networks. Current opinion in structural biology 39, 106–114. 10.1016/j.sbi.2016.06.012. - DOI - PMC - PubMed
    1. Pan X, and Kortemme T (2021). Recent advances in de novo protein design: principles, methods, and applications. The Journal of biological chemistry, 100558. 10.1016/j.jbc.2021.100558. - DOI - PMC - PubMed
    1. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Zidek A, Potapenko A, et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature. 10.1038/s41586-021-03819-2. - DOI - PMC - PubMed

LinkOut - more resources