Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Feb;53(1):1-28.
doi: 10.1080/10409238.2017.1380596. Epub 2017 Oct 4.

Finding the needle in the haystack: towards solving the protein-folding problem computationally

Affiliations
Review

Finding the needle in the haystack: towards solving the protein-folding problem computationally

Bian Li et al. Crit Rev Biochem Mol Biol. 2018 Feb.

Abstract

Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.

Keywords: Protein-folding problem; conformational sampling algorithms; protein energy approximations; protein structure prediction; protein-folding simulation; sparse experimental data.

PubMed Disclaimer

Conflict of interest statement

Disclosure statement

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.

Figures

Figure 1.
Figure 1.
The number of articles published each year (1973–2016) with the phrase “protein structure prediction” or “protein folding” in either the title, or abstract or author keywords. The data were taken from web of science. A color version of this figure is available online (see color version of this figure at www.tandfonline.com/ibmg).
Figure 2.
Figure 2.
Schematic three-dimensional surface rendering of a hypothetical folding funnel diagram and a (Gibbs) free energy landscape to reference state. (A) A folding funnel diagram is a pictorial representation of the counteracting nature of the two thermodynamic variables, effective energy and configurational entropy, in protein folding and explains how the Levinthal paradox is resolved (Karplus 2011). The effective energy is plotted vertically and the configurational entropy horizontally. The funneled shape stems from the fact that the number of accessible configurations, which determines the configurational entropy, decreases as the native state of a protein is approached (Karplus 2011). (B) A free energy landscape maps between conformations and free energies. The global minimum on the landscape corresponds to the conformation of the native state and local minima correspond to partially unfolded states, which are separated by free energy barriers from the native state. Note that real free energy landscapes are high-dimensional and extremely rugged. A color version of this figure is available online (see color version of this figure at www.tandfonline.com/ibmg).
Figure 3.
Figure 3.
Folding time scales accessible to MD simulations have increased exponentially since Duan and Kollman used MD simulations in explicit solvent to study the process through which the villin headpiece reaches a marginally state (Duan and Kollman 1998). Shown are proteins simulated using unbiased, all-atom MD simulations in empirical force fields reported in the literature. Here, an accessible folding time scale is defined as one within which folding events are observed in MD simulations of folding from unfolded states. According to this definition, whether the ~10 ms folding time of ACBP is already accessible needs to be confirmed by further simulations as no folding events were observed in any of the trajectories used to construct a Markov state model of the ACBP-folding reaction (Voelz et al. 2012). Adapted, with permission, from reference (Lane et al. 2013). See reference (Lane et al. 2013) for reference to each folding simulation highlighted in the figure. A color version of this figure is available online (see color version of this figure at www.tandfonline.com/ibmg).
Figure 4.
Figure 4.
A sketch of the process of REMD and that of metadynamics. REMD: a set of noninteracting replicas (T1 through T4 in this illustration), each runs at a different temperature. Each color represents a single replica. As the simulation proceeds, each replica walks up and down in temperature. In an efficient REMD, replicas at neighboring temperatures are swapped (shown as double-headed arrows) based on Metropolis criterion and all replicas will experience swapping. Metadynamics: this illustrative system has two minima A and B (gray curve). The system trapped in B is lifted by progressive deposition of repulsive Gaussian kernels (green curve) and the free energy landscape changes accordingly (blue dashed curve). After B is filled up, the system moves into A which is filled up similarly. When the simulation completes, the green curve gives a first rough negative estimate of the free energy landscape of the system. A color version of this figure is available online (see color version of this figure at www.tandfonline.com/ibmg).
Figure 5.
Figure 5.
Monte Carlo simulated annealing and genetic operations in genetic algorithms. (A) A Monte Carlo simulated annealing procedure allows the system to “freely” navigate on the free energy surface. For example, transition from state 4 to 5 would be prohibitive to MD simulations due to the high-energy barrier separating them. (B) In genetic algorithms, conformations are encoded as bit strings (or real-valued arrays) called chromosomes. A mutation operation flips the bit value at a randomly selected site, whereas a crossover operation takes a pair of chromosomes and exchanges parts of chromosomes split at a randomly selected crossover site. A color version of this figure is available online (see color version of this figure at www.tandfonline.com/ibmg).
Figure 6.
Figure 6.
Cooperative effects of energy functions and sparse restraints on a hypothetical protein. (A) the energy function has two comparable minima, lending itself the inability to tell decoy D1 from the native state N; (B) a scenario where decoy D1 violates some restraints and is thus penalized by the restraint score. However, as sparse restraints by themselves are insufficient to completely determine the protein’s structure, there exists decoys, such as D2, that satisfy the restraints as well as the native state N does; (C) Adding a restraint score to the energy function results in what’s called a pseudo-energy function which, in an ideal scenario, would be able to tell decoys apart from the native state; (D) the real free energy surface of the protein. A color version of this figure is available online (see color version of this figure at www.tandfonline.com/ibmg).
Figure 7.
Figure 7.
Highlights of de novo structure prediction in CASP experiments. Predicted structure models (rainbow) are superimposed with the crystal structures (gray). (A) Rosetta-predicted structure model superimposed with a crystal structure (PDB code: 1whz) of CASP6 target T0281, hypothetical protein from Thermus thermophilus Hb8. This model is astonishingly close to the crystal structure, with a Cα-RMSD of 1.6 Å. (B) I-TASSER-predicted structure model superimposed with a crystal structure (PDB code: 4dkc) for the CASP10 ROLL target R0007, interleukin-34 protein from Homo sapiens. (C) Superposition of a QUARK-predicted structure model with a crystal structure (PDB code: 5tf3) of the CASP11 target T0837, hypothetical protein YPO2654 from Yersinia pestis. This model has a Cα-RMSD of 2.9 Å from the crystal structure. (D) Superposition of a BCL::Fold-predicted structure model with a solution NMR structure (PDB code: 2mq8) of CASP11 target T0769, a de novo designed protein LFR11 with ferredoxin fold. While this target is in the category template-based modeling, BCL::Fold assembled models for it without relying on any homologous templates. A color version of this figure is available online (see color version of this figure at www.tandfonline.com/ibmg).

Similar articles

Cited by

References

    1. Abeysinghe S, Ju T, Baker ML, Chiu W. 2008. Shape modeling and matching in identifying 3D protein structures. Computer-Aided Design. 40:708–720.
    1. Alexander N, Bortolus M, Al-Mestarihi A, Mchaourab H, Meiler J. 2008. De Novo high-resolution protein structure determination from sparse spin labeling EPR data. Structure. 16:181–195. - PMC - PubMed
    1. Altenbach C, Froncisz W, Hemker R, Mchaourab H, Hubbell WL. 2005. Accessibility of Nitroxide side chains: absolute heisenberg exchange rates from power saturation EPR. Biophys J. 89:2103–2112. - PMC - PubMed
    1. Amir ED, Kalisman N, Keasar C. 2008. Differentiable, multidimensional, knowledge-based energy terms for torsion angle probabilities and propensities. Proteins. 72:62–73. - PubMed
    1. Anfinsen CB. 1973. Principles that govern the folding of protein chains. Science. 181:223–230. - PubMed

Publication types

LinkOut - more resources