Automated structure refinement of macromolecular assemblies from cryo-EM maps using Rosetta

doi:10.7554/eLife.17219

. 2016 Sep 26:5:e17219.

doi: 10.7554/eLife.17219.

Automated structure refinement of macromolecular assemblies from cryo-EM maps using Rosetta

Ray Yu-Ruei Wang^{1

2}, Yifan Song², Benjamin A Barad^{3

4}, Yifan Cheng^{5

6}, James S Fraser³, Frank DiMaio^{2

7}

Affiliations

¹ Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, United States.
² Department of Biochemistry, University of Washington, Seattle, United States.
³ Department of Bioengineering and Therapeutic Science, University of California, San Francisco, San Francisco, United States.
⁴ Graduate Group in Biophysics, University of California, San Francisco, San Francisco, United States.
⁵ Keck Advanced Microscopy Laboratory, University of California, San Francisco, San Francisco, United States.
⁶ Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, United States.
⁷ Institute for Protein Design, University of Washington, Seattle, United States.

PMID: 27669148
PMCID: PMC5115868
DOI: 10.7554/eLife.17219

Automated structure refinement of macromolecular assemblies from cryo-EM maps using Rosetta

Ray Yu-Ruei Wang et al. Elife. 2016.

. 2016 Sep 26:5:e17219.

doi: 10.7554/eLife.17219.

Authors

Ray Yu-Ruei Wang^{1

2}, Yifan Song², Benjamin A Barad^{3

4}, Yifan Cheng^{5

6}, James S Fraser³, Frank DiMaio^{2

7}

Affiliations

¹ Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, United States.
² Department of Biochemistry, University of Washington, Seattle, United States.
³ Department of Bioengineering and Therapeutic Science, University of California, San Francisco, San Francisco, United States.
⁴ Graduate Group in Biophysics, University of California, San Francisco, San Francisco, United States.
⁵ Keck Advanced Microscopy Laboratory, University of California, San Francisco, San Francisco, United States.
⁶ Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, United States.
⁷ Institute for Protein Design, University of Washington, Seattle, United States.

PMID: 27669148
PMCID: PMC5115868
DOI: 10.7554/eLife.17219

Abstract

Cryo-EM has revealed the structures of many challenging yet exciting macromolecular assemblies at near-atomic resolution (3-4.5Å), providing biological phenomena with molecular descriptions. However, at these resolutions, accurately positioning individual atoms remains challenging and error-prone. Manually refining thousands of amino acids - typical in a macromolecular assembly - is tedious and time-consuming. We present an automated method that can improve the atomic details in models that are manually built in near-atomic-resolution cryo-EM maps. Applying the method to three systems recently solved by cryo-EM, we are able to improve model geometry while maintaining the fit-to-density. Backbone placement errors are automatically detected and corrected, and the refinement shows a large radius of convergence. The results demonstrate that the method is amenable to structures with symmetry, of very large size, and containing RNA as well as covalently bound ligands. The method should streamline the cryo-EM structure determination process, providing accurate and unbiased atomic structure interpretation of such maps.

Keywords: Rosetta; atomic models; biophysics; computational biology; cryo-EM; macromolecular assemblies; membrane proteins; none; structural biology; structure refinement; systems biology.

PubMed Disclaimer

Conflict of interest statement

YS: Co-founder of Cyrus Biotechnology, Inc., which will develop and market graphic-interface software for using Rosetta. The other authors declare that no competing interests exist.

Figures

**Figure 1.. An overview of the three stages of automated refinement.**
(Left) In stage 1, problematic regions are predicted using a newly developed error predictor that looks for local strain in the model and poor local density-fit. These selected regions are subject to iterative fragment-based rebuilding within a Monte Carlo sampling trajectory. Refinement in this stage is restricted to using one-half of the data, referred to as the training map. (Middle) In stage 2, the best models from the ~5000 independent Monte Carlo trajectories are selected. Models are selected based: on agreement to the validation map (independently constructed from the other half of the data), then by model geometry as assessed by MolProbity, and finally, on agreement to the full reconstruction. At this point, the selected models should in general have good fit-to-density and good geometry without overfitting to the data. (Right) In stage 3, using the 10 best models selected, we then optimize against the full reconstruction. Two half maps are used to choose the optimal density weight to refine structures using full-reconstruction. Finally, these top 10 models are optimized (without large-scale backbone rebuilding) into the full-reconstruction, which alternates with voxel-size refinement iteratively. Finally, these models are subject to B-factor refinement. **DOI:** http://dx.doi.org/10.7554/eLife.17219.002

**Figure 1—figure supplement 2.. Incorporating model strain improves error detection.**
Guided by the 3.3-Å 20S proteasome reconstruction, we evaluated 500 models against the high-resolution crystal structure. We plot here the precision (y-axis) and recall (x-axis) of predicting which residues were incorrectly placed (RMS > 1Å). Use of density alone (pink line) is outperformed by using a combination of density and model strain (blue line). Our refinement approach considers four points on this curve when picking density + model strain cutoffs, indicated on the plot with 'Stage1–4'. **DOI:** http://dx.doi.org/10.7554/eLife.17219.004

**Figure 1—figure supplement 3.. Density weight optimization against half maps for Mitoribosome.**
Before refinement against the full reconstruction, we optimize the weight on the 'fit-to-density' energy using half maps, to avoid overfitting. We plot several key metrics here as a function of weight on the fit-to-density score term (x-axis), including the Fourier Shell Correlation (FSC) 'overfitting' (FSC work-free, top histogram), the Rosetta energy (second histogram), and several Molprobity model geometry terms (histograms 3–6). In all cases, we see a sharp inflection point at which overfitting increases and geometry gets notably worse. As a general rule-of-thumb, we use the weight maximizing FSCfree–(0.04*per-residue-energy to capture this inflection point). **DOI:** http://dx.doi.org/10.7554/eLife.17219.005

**Figure 1—figure supplement 4.. Model geometry is improved with a separate pre-proline potential.**
Refined models initially had poor pre-proline geometry. Thus, a new backbone torsional potential was created which separately treats pre-proline and pre-non-proline residues. In the plot, we show the old potential (left), the new pre-non-proline potential (middle), and the pre-proline potential (right) for three different residue identities. The color indicates the unweighted energy values, using the key on the right. **DOI:** http://dx.doi.org/10.7554/eLife.17219.006

**Figure 2.. The accuracy of voxel size refinement and the effect of B-factor sharpening in Rosetta refinement.**
(A) Voxel-size refinement on perturbed models. Perturbed structures were generated by running short MD trajectories in Rosetta, followed by all-atom minimization. Voxel size is refined against the perturbed models, yielding the density distribution in red. Following cycles of iterated voxel refinement and all-atom refinement, the voxel size shows significantly better convergence (blue line). (B) Rosetta structure refinement with a range values of B-factor sharpening. Integrated Fourier Shell Correlation eavluated using the validation map (free-iFSC) is plotted here as a function of B-factor sharpening of the training map. The results indicate that our refinement method is not particularly sensitive to the extent of B-factor sharpening, behaving similarly over a range of sharpening values between −40 and −130. The error bars show standard deviation of the free-iFSC among the top10 ensemble models (see Materials and methods for the ensemble selection method). **DOI:** http://dx.doi.org/10.7554/eLife.17219.007

**Figure 3.. Refinement of the apo TRPV1 channel (EMD-5778) shows improved model quality.**
(A) Comparison of the deposited and Rosetta-refined models, as assessed by MolProbity. Residues reported as violations are colored using the key shown on the far right. Blue open arrows indicate that the hydrogen-bond geometry of a β-hairpin was automatically detected and improved in the Rosetta refined model. (B) An overlay of the asymmetric unit of the deposited (pink) and the Rosetta-refined (green) model indicates the magnitude of conformational changes that are explored by our refinement approach. (C) The agreement of models to map assessed by Fourier space correlation (y-axis) at each resolution shell (x-axis), where the reported resolution (3.4Å) is depicted in a dashed orange line. **DOI:** http://dx.doi.org/10.7554/eLife.17219.008

**Figure 4.. Refinement of the TRPV1 channel identifies a previously unmodeled disulfide bond.**
(A) An overview of the entire structure, estimating local model uncertainty in two ways: local structural diversity and refined B-factors. Local structure diversity is indicated by showing (left) an overlay of the top 10 Rosetta models, (middle) the top model colored by per residue deviation, and (right) the refined per-atom B-factors. Using the model selection method illustrated in the middle panel of Figure 1, the Cα RMSDs among the selected ensemble range from 0.44 to 0.63 Å. The orange square shows the location of a newly identified disulfide bond (C386–C390) revealed by our refinement protocol. (B) A zoomed-in view of the disulfide linkage (C386–C390) identified by the automated method. Note that the sidechain coordinates of C390 were unassigned in the deposited model; for presentation, the sidechain atoms of C390 were optimally added by Rosetta on the basis of the deposited backbone torsion angles of C390. **DOI:** http://dx.doi.org/10.7554/eLife.17219.009

**Figure 5.. Refinement of the F₄₂₀-reducing [NiFe] hydrogenase (EMD-2513) improves the model geometry.**
(A) An illustration comparing the model geometry of the deposited (upper panel) and Rosetta-refined (lower panel) models. Three chains (A/B/C) of the asymmetric unit of the complex are shown as cartoon with geometry violations reported by MolProbity colored according to the key shown on the far right. Four iron–sulfur clusters [4Fe4S] and a FAD are shown in a stick representation. Metal ions are depicted as spheres, with Zn grey, Fe orange, and Ni green. (B) Model–map agreement – as assessed by Fourier shell correlation (y-axis) as a function of resolution (x-axis) – quantifies this improvement following voxel-size refinement. (C) Model quality as assessed by EMRinger and MolProbity. The x-axis shows methods used to evaluate the models, while the y-axis shows the scores under each criterion. **DOI:** http://dx.doi.org/10.7554/eLife.17219.011

**Figure 5—figure supplement 1.. The symmetry operators denoted in the deposited PDB (PDB 4ci0) produce a complex that could not fit into the deposited density map properly.**
(Left panel) The symmetric complex downloaded from a protein data bank as a biounit shifts the entire complex out of the deposited density map. The middle and right panels show a zoomed-in view of two regions in the deposited models corresponding to the helix and the sheet indicated by the orange and cyan squares, respectively, in the left panel. **DOI:** http://dx.doi.org/10.7554/eLife.17219.012

**Figure 6.. Refinement of the large subunit of the human mitochondrial ribosome (EMD-2762) shows improvements to all subunits.**
(A) Scatterplots of model quality for each of the 48 protein chains compare the deposited (x-axis) and Rosetta (y-axis) models using MolProbity. On the left, the MolProbity scores of all 48 protein chains are compared, where a lower values indicates a better model geometry. On the right, the percentage of 'Ramachandran favored' residues on each chain are compared, with higher values preferable. (B) An evaluation of the fit-to-density of each protein chain. On the left, we compare the Fourier shell correlation (FSC) of each chain before and after refinement; we integrate the FSC from 10Å to 3.4Å. Higher values indicate better agreement with the data. The largest improvement, chain k, is indicated by the red arrow. On the right, we show the full FSC curve, with the deposited model shown in pink, and the Rosetta refined model shown in green; the reported map resolution (3.4Å) is indicated in the dashed orange line. (C) A zoomed-in view indicating a much improved backbone geometry and the large radius of convergence of the refinement of chain k. The left panel shows that the density for chain k is in the region of relatively low local resolution. **DOI:** http://dx.doi.org/10.7554/eLife.17219.013

**Figure 6—figure supplement 1.. Local relax shows better placement of sidechains for large systems.**
In the case of the mitoribosome, refinement of a particularly well-resolved region in the map (left) led to sidechains that are clearly misaligned with the density (middle). This was due to the poor convergence of our Monte Carlo sidechain placing approach when applied to systems with more than 1000 residues. Our alternative approach, LocalRelax, which performs many local sidechain optimizations, correctly places sidechains in a way that is consistent with density (right). **DOI:** http://dx.doi.org/10.7554/eLife.17219.015

**Figure 6—figure supplement 2.. EMRinger analysis on refinement of the large subunit of the human mitochondrial ribosome.**
A scatterplot of model quality assessed by EMringer of each of the 48 protein chains compares the deposited (x-axis) and Rosetta (y-axis) models. **DOI:** http://dx.doi.org/10.7554/eLife.17219.016

See this image and copyright information in PMC

Cited by

Unveiling the stochastic nature of human heteropolymer ferritin self-assembly mechanism.
Bou-Abdallah F, Fish J, Terashi G, Zhang Y, Kihara D, Arosio P. Bou-Abdallah F, et al. Protein Sci. 2024 Aug;33(8):e5104. doi: 10.1002/pro.5104. Protein Sci. 2024. PMID: 38995055
Structural basis of the human NAIP/NLRC4 inflammasome assembly and pathogen sensing.
Matico RE, Yu X, Miller R, Somani S, Ricketts MD, Kumar N, Steele RA, Medley Q, Berger S, Faustin B, Sharma S. Matico RE, et al. Nat Struct Mol Biol. 2024 Jan;31(1):82-91. doi: 10.1038/s41594-023-01143-z. Epub 2024 Jan 4. Nat Struct Mol Biol. 2024. PMID: 38177670 Free PMC article.
Structural basis of Frizzled 4 in recognition of Dishevelled 2 unveils mechanism of WNT signaling activation.
Qian Y, Ma Z, Xu Z, Duan Y, Xiong Y, Xia R, Zhu X, Zhang Z, Tian X, Yin H, Liu J, Song J, Lu Y, Zhang A, Guo C, Jin L, Kim WJ, Ke J, Xu F, Huang Z, He Y. Qian Y, et al. Nat Commun. 2024 Sep 2;15(1):7644. doi: 10.1038/s41467-024-52174-z. Nat Commun. 2024. PMID: 39223191 Free PMC article.
Atomic structure of the predominant GII.4 human norovirus capsid reveals novel stability and plasticity.
Hu L, Salmen W, Chen R, Zhou Y, Neill F, Crowe JE Jr, Atmar RL, Estes MK, Prasad BVV. Hu L, et al. Nat Commun. 2022 Mar 10;13(1):1241. doi: 10.1038/s41467-022-28757-z. Nat Commun. 2022. PMID: 35273142 Free PMC article.
Multivalent designed proteins neutralize SARS-CoV-2 variants of concern and confer protection against infection in mice.
Hunt AC, Case JB, Park YJ, Cao L, Wu K, Walls AC, Liu Z, Bowen JE, Yeh HW, Saini S, Helms L, Zhao YT, Hsiang TY, Starr TN, Goreshnik I, Kozodoy L, Carter L, Ravichandran R, Green LB, Matochko WL, Thomson CA, Vögeli B, Krüger A, VanBlargan LA, Chen RE, Ying B, Bailey AL, Kafai NM, Boyken SE, Ljubetič A, Edman N, Ueda G, Chow CM, Johnson M, Addetia A, Navarro MJ, Panpradist N, Gale M Jr, Freedman BS, Bloom JD, Ruohola-Baker H, Whelan SPJ, Stewart L, Diamond MS, Veesler D, Jewett MC, Baker D. Hunt AC, et al. Sci Transl Med. 2022 May 25;14(646):eabn1252. doi: 10.1126/scitranslmed.abn1252. Epub 2022 May 25. Sci Transl Med. 2022. PMID: 35412328 Free PMC article.

See all "Cited by" articles

References

1. Afonine PV, Grosse-Kunstleve RW, Echols N, Headd JJ, Moriarty NW, Mustyakimov M, Terwilliger TC, Urzhumtsev A, Zwart PH, Adams PD. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallographica Section D Biological Crystallography. 2012;68:352–367. doi: 10.1107/S0907444912001308. - DOI - PMC - PubMed
1. Allegretti M, Mills DJ, McMullan G, Kühlbrandt W, Vonck J. Atomic model of the F420-reducing [NiFe] hydrogenase by electron cryo-microscopy using a direct electron detector. eLife. 2014;3:e01963. doi: 10.7554/eLife.01963. - DOI - PMC - PubMed
1. Bai XC, Fernandez IS, McMullan G, Scheres SH. Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles. eLife. 2013;2:e00461. doi: 10.7554/eLife.00461. - DOI - PMC - PubMed
1. Barad BA, Echols N, Wang RY, Cheng Y, DiMaio F, Adams PD, Fraser JS. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nature Methods. 2015;12:943–946. doi: 10.1038/nmeth.3541. - DOI - PMC - PubMed
1. Bartesaghi A, Matthies D, Banerjee S, Merk A, Subramaniam S. Structure of -galactosidase at 3.2-A resolution obtained by cryo-electron microscopy. PNAS. 2014;111:11709–11714. doi: 10.1073/pnas.1402809111. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM098672/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

[1] Afonine PV, Grosse-Kunstleve RW, Echols N, Headd JJ, Moriarty NW, Mustyakimov M, Terwilliger TC, Urzhumtsev A, Zwart PH, Adams PD. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallographica Section D Biological Crystallography. 2012;68:352–367. doi: 10.1107/S0907444912001308. - DOI - PMC - PubMed

[2] Afonine PV, Grosse-Kunstleve RW, Echols N, Headd JJ, Moriarty NW, Mustyakimov M, Terwilliger TC, Urzhumtsev A, Zwart PH, Adams PD. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallographica Section D Biological Crystallography. 2012;68:352–367. doi: 10.1107/S0907444912001308. - DOI - PMC - PubMed

[3] Allegretti M, Mills DJ, McMullan G, Kühlbrandt W, Vonck J. Atomic model of the F420-reducing [NiFe] hydrogenase by electron cryo-microscopy using a direct electron detector. eLife. 2014;3:e01963. doi: 10.7554/eLife.01963. - DOI - PMC - PubMed

[4] Allegretti M, Mills DJ, McMullan G, Kühlbrandt W, Vonck J. Atomic model of the F420-reducing [NiFe] hydrogenase by electron cryo-microscopy using a direct electron detector. eLife. 2014;3:e01963. doi: 10.7554/eLife.01963. - DOI - PMC - PubMed

[5] Bai XC, Fernandez IS, McMullan G, Scheres SH. Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles. eLife. 2013;2:e00461. doi: 10.7554/eLife.00461. - DOI - PMC - PubMed

[6] Bai XC, Fernandez IS, McMullan G, Scheres SH. Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles. eLife. 2013;2:e00461. doi: 10.7554/eLife.00461. - DOI - PMC - PubMed

[7] Barad BA, Echols N, Wang RY, Cheng Y, DiMaio F, Adams PD, Fraser JS. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nature Methods. 2015;12:943–946. doi: 10.1038/nmeth.3541. - DOI - PMC - PubMed

[8] Barad BA, Echols N, Wang RY, Cheng Y, DiMaio F, Adams PD, Fraser JS. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nature Methods. 2015;12:943–946. doi: 10.1038/nmeth.3541. - DOI - PMC - PubMed

[9] Bartesaghi A, Matthies D, Banerjee S, Merk A, Subramaniam S. Structure of -galactosidase at 3.2-A resolution obtained by cryo-electron microscopy. PNAS. 2014;111:11709–11714. doi: 10.1073/pnas.1402809111. - DOI - PMC - PubMed

[10] Bartesaghi A, Matthies D, Banerjee S, Merk A, Subramaniam S. Structure of -galactosidase at 3.2-A resolution obtained by cryo-electron microscopy. PNAS. 2014;111:11709–11714. doi: 10.1073/pnas.1402809111. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated structure refinement of macromolecular assemblies from cryo-EM maps using Rosetta

Affiliations

Automated structure refinement of macromolecular assemblies from cryo-EM maps using Rosetta

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources