Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

doi:10.1038/s41592-019-0575-8

. 2019 Nov;16(11):1153-1160.

doi: 10.1038/s41592-019-0575-8. Epub 2019 Oct 7.

Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

Tristan Bepler^{1

2}, Andrew Morin^{2

3}, Micah Rapp^{4

5}, Julia Brasch^{4

5}, Lawrence Shapiro⁴, Alex J Noble⁶, Bonnie Berger^{7

8}

Affiliations

¹ Computational and Systems Biology, MIT, Cambridge, MA, USA.
² Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
³ Department of Mathematics, MIT, Cambridge, MA, USA.
⁴ Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
⁵ National Resource for Automated Molecular Microscopy, Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY, USA.
⁶ National Resource for Automated Molecular Microscopy, Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY, USA. anoble@nysbc.org.
⁷ Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA. bab@mit.edu.
⁸ Department of Mathematics, MIT, Cambridge, MA, USA. bab@mit.edu.

PMID: 31591578
PMCID: PMC6858545
DOI: 10.1038/s41592-019-0575-8

Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

Tristan Bepler et al. Nat Methods. 2019 Nov.

. 2019 Nov;16(11):1153-1160.

doi: 10.1038/s41592-019-0575-8. Epub 2019 Oct 7.

Authors

Tristan Bepler^{1

2}, Andrew Morin^{2

3}, Micah Rapp^{4

5}, Julia Brasch^{4

5}, Lawrence Shapiro⁴, Alex J Noble⁶, Bonnie Berger^{7

8}

Affiliations

¹ Computational and Systems Biology, MIT, Cambridge, MA, USA.
² Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
³ Department of Mathematics, MIT, Cambridge, MA, USA.
⁴ Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
⁵ National Resource for Automated Molecular Microscopy, Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY, USA.
⁶ National Resource for Automated Molecular Microscopy, Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY, USA. anoble@nysbc.org.
⁷ Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA. bab@mit.edu.
⁸ Department of Mathematics, MIT, Cambridge, MA, USA. bab@mit.edu.

PMID: 31591578
PMCID: PMC6858545
DOI: 10.1038/s41592-019-0575-8

Abstract

Cryo-electron microscopy is a popular method for the determination of protein structures; however, identifying a sufficient number of particles for analysis can take months of manual effort. Current computational approaches find many false positives and require ad hoc postprocessing, especially for unusually shaped particles. To address these shortcomings, we develop Topaz, an efficient and accurate particle-picking pipeline using neural networks trained with a general-purpose positive-unlabeled learning method. This framework enables particle detection models to be trained with few sparsely labeled particles and no labeled negatives. Topaz retrieves many more real particles than conventional picking methods while maintaining low false-positive rates, is capable of picking challenging unusually shaped proteins (for example, small, non-globular and asymmetric particles), produces more representative particle sets and does not require post hoc curation. We demonstrate the performance of Topaz on two difficult datasets and three conventional datasets. Topaz is modular, standalone, free and open source ( http://topaz.csail.mit.edu ).

PubMed Disclaimer

Conflict of interest statement

Competing financial interests

The authors declare no competing financial interests.

Figures

**Figure 1 |**
Topaz particle picking pipeline using CNNs trained with positive and unlabeled data. (a) Given a set of labeled particles, a CNN is trained to classify positive and negative regions using particle locations as positive regions and all other regions as unlabeled. Labeled particles from EMPIAR-10096 are indicated by blue circles and a few positive and unlabeled regions are depicted. **(b)** Once the CNN classifier is trained, particles are predicted in two steps. First, the classifier is applied to each micrograph region to give per region predictions. Second, coordinates are extracted from the region predictions using non-maximum suppression. The left image shows a raw micrograph from EMPIAR-10096. The middle image depicts the micrograph with overlaid region predictions [blue = low confidence, red = high confidence]. The right image indicates predicted particles after using non-maximum suppression on the region predictions.

**Figure 2 |**
Reconstructions of the Toll receptor using particles picked by Topaz, template-based (Template), and DoG methods. Template and DoG particles were filtered through multiple rounds of 2D classification before analysis. Topaz particles were not filtered. **(a)** Density map using particles picked with Topaz. The global resolution is 3.70 Å at FSC_0.143 with a sphericity of 0.731. **(b)** Density map using particles picked using template picking. The global resolution is 3.92 Å at FSC_0.143 with a sphericity of 0.706. **(c)** Density map using particles picked using difference of Gaussians (DoG). The global resolution is 3.86 Å at FSC_0.143 with a sphericity of 0.652. **(d)** Quantification of picked particles for each protein view based on 2D classification. **(e)** Example micrograph (representative of >100 micrographs examined) showing Topaz picks (red circles) and protein aggregation (outlined in green). Scale bar for the top of (a) is 5 nm.

**Figure 3 |**
Single particle reconstructions from published particles, Topaz particles, and Topaz particles with published particles removed (left to right). Below each reconstruction is the corresponding 3DFSC plot. **(a)** T20S proteasome (EMPIAR-10025) using the provided aligned, dose-weighted micrographs. **(b)** 80S ribosome (EMPIAR-10028). **(c)** Rabbit muscle aldolase (EMPIAR-10215). Scale bars: 3 nm

**Figure 4 |**
Reconstruction resolution and 2D class averages for Topaz particles at decreasing log-likelihood ratio thresholds. **(a)** Number of particles vs. reconstruction resolution for Topaz particles (increasing number of particles corresponds to decreasing log-likelihood threshold) and randomly sampled subsets of the published particle set. Resolution is as reported by cryoSPARC. For the published particle sets the mean of three replicates is marked with standard deviation shaded in grey. **(b)** Stacked bar plots show the quantification of the number of true and false positives at each threshold based on 2D class averages. Decreasing threshold corresponds to increasing number of predicted particles. True positives are colored in blue and false positives in orange. **(c)** 2D class averages obtained at each score threshold for the T20S proteasome (EMPIAR-10025). Number of particles (ptcls) and effective sample size (ess) for each class are reported by cryoSPARC. NaN is reported for classes without any particles assigned. Classes determined to be false positives are marked with orange boxes. Several classes which appear to be false positives at high score thresholds do not contain any particles and, therefore, are not highlighted.

**Figure 5 |**
Comparison of models trained using different objective functions with varying numbers of labeled positives on the EMPIAR-10096 and EMPIAR-10234 datasets. **(a)** Plots show the mean and standard deviation of the average-precision score for predicting positive regions in the EMPIAR-10096 and EMPIAR-10234 test set micrographs for models trained using either the naive PN, Kiryo et al.’s non-negative risk estimator (PU), our GE-KL, or our GE-binomial objective function. Each number of labeled positives was sampled 10 times independently. (*) indicates experiments in which GE-binomial achieved higher average-precision than GE-KL with p < 0.05. (†) indicates experiments in which GE-KL achieved higher average-precision than GE-binomial with p < 0.05 according to a two-sided dependent t-test. **(b)** Plots show the mean and standard deviation of the average-precision score for models trained jointly with autoencoders with different reconstruction loss weights (γ). γ=0 corresponds to training the classifier without the autoencoder. γ=10/N means the reconstruction loss is weighted by 10 divided by the number of labeled positives used to train the model.

See this image and copyright information in PMC

Cited by

Structure of the flotillin complex in a native membrane environment.
Fu Z, MacKinnon R. Fu Z, et al. Proc Natl Acad Sci U S A. 2024 Jul 16;121(29):e2409334121. doi: 10.1073/pnas.2409334121. Epub 2024 Jul 10. Proc Natl Acad Sci U S A. 2024. PMID: 38985763 Free PMC article.
Human coronavirus HKU1 recognition of the TMPRSS2 host receptor.
McCallum M, Park YJ, Stewart C, Sprouse KR, Brown J, Tortorici MA, Gibson C, Wong E, Ieven M, Telenti A, Veesler D. McCallum M, et al. bioRxiv [Preprint]. 2024 Jan 9:2024.01.09.574565. doi: 10.1101/2024.01.09.574565. bioRxiv. 2024. Update in: Cell. 2024 Aug 8;187(16):4231-4245.e13. doi: 10.1016/j.cell.2024.06.006 PMID: 38260518 Free PMC article. Updated. Preprint.
MiLoPYP: self-supervised molecular pattern mining and particle localization in situ.
Huang Q, Zhou Y, Bartesaghi A. Huang Q, et al. Nat Methods. 2024 Oct;21(10):1863-1872. doi: 10.1038/s41592-024-02403-6. Epub 2024 Sep 9. Nat Methods. 2024. PMID: 39251798 Free PMC article.
Structural basis for accommodation of emerging B.1.351 and B.1.1.7 variants by two potent SARS-CoV-2 neutralizing antibodies.
Cerutti G, Rapp M, Guo Y, Bahna F, Bimela J, Reddem ER, Yu J, Wang P, Liu L, Huang Y, Ho DD, Kwong PD, Sheng Z, Shapiro L. Cerutti G, et al. Structure. 2021 Jul 1;29(7):655-663.e4. doi: 10.1016/j.str.2021.05.014. Epub 2021 Jun 9. Structure. 2021. PMID: 34111408 Free PMC article.
Stoichiometry and architecture of the human pyruvate dehydrogenase complex.
Zdanowicz R, Afanasyev P, Pruška A, Harrison JA, Giese C, Boehringer D, Leitner A, Zenobi R, Glockshuber R. Zdanowicz R, et al. Sci Adv. 2024 Jul 19;10(29):eadn4582. doi: 10.1126/sciadv.adn4582. Epub 2024 Jul 17. Sci Adv. 2024. PMID: 39018392 Free PMC article.

See all "Cited by" articles

References

1. Cheng Y, Grigorieff N, Penczek PA & Walz T A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015). - PMC - PubMed
1. Stagg SM, Noble AJ, Spilman M & Chapman MS ResLog plots as an empirical metric of the quality of cryo-EM reconstructions. J. Struct. Biol 185, 418–426 (2014). - PMC - PubMed
1. Rosenthal PB & Henderson R Optimal Determination of Particle Orientation, Absolute Hand, and Contrast Loss in Single-particle Electron Cryomicroscopy. J. Mol. Bio 333, 721–745 (2003). - PubMed
1. Scheres SHW Semi-automated selection of cryo-EM particles in RELION-1.3. J. Struct. Biol 189, 114–122 (2015). - PMC - PubMed
1. Tang G et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol 157, 38–46 (2007). - PubMed

Methods-only References

1. Campbell MG, Veesler D, Cheng A, Potter CS & Carragher B 2.8 Å resolution reconstruction of the Thermoplasma acidophilum 20S proteasome using cryo-electron microscopy. Elife 4, (2015). - PMC - PubMed
1. Wong W et al. Cryo-EM structure of the Plasmodium falciparum 80S ribosome bound to the anti-protozoan drug emetine. Elife 3, (2014). - PMC - PubMed
1. Tan YZ et al. Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nat. Methods 14, 793–796 (2017). - PMC - PubMed
1. Xu H et al. Structural Basis of Nav1.7 Inhibition by a Gating-Modifier Spider Toxin. Cell 176, 702–715 (2019). - PubMed
1. Ioffe S & Szegedy C Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. in International Conference on Machine Learning 448–456 (2015).

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions

Grants and funding

S10 OD019994/OD/NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

[1] Cheng Y, Grigorieff N, Penczek PA & Walz T A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015). - PMC - PubMed

[2] Cheng Y, Grigorieff N, Penczek PA & Walz T A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015). - PMC - PubMed

[3] Stagg SM, Noble AJ, Spilman M & Chapman MS ResLog plots as an empirical metric of the quality of cryo-EM reconstructions. J. Struct. Biol 185, 418–426 (2014). - PMC - PubMed

[4] Stagg SM, Noble AJ, Spilman M & Chapman MS ResLog plots as an empirical metric of the quality of cryo-EM reconstructions. J. Struct. Biol 185, 418–426 (2014). - PMC - PubMed

[5] Rosenthal PB & Henderson R Optimal Determination of Particle Orientation, Absolute Hand, and Contrast Loss in Single-particle Electron Cryomicroscopy. J. Mol. Bio 333, 721–745 (2003). - PubMed

[6] Rosenthal PB & Henderson R Optimal Determination of Particle Orientation, Absolute Hand, and Contrast Loss in Single-particle Electron Cryomicroscopy. J. Mol. Bio 333, 721–745 (2003). - PubMed

[7] Scheres SHW Semi-automated selection of cryo-EM particles in RELION-1.3. J. Struct. Biol 189, 114–122 (2015). - PMC - PubMed

[8] Scheres SHW Semi-automated selection of cryo-EM particles in RELION-1.3. J. Struct. Biol 189, 114–122 (2015). - PMC - PubMed

[9] Tang G et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol 157, 38–46 (2007). - PubMed

[10] Tang G et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol 157, 38–46 (2007). - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

Affiliations

Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Methods-only References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources