Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov;16(11):1153-1160.
doi: 10.1038/s41592-019-0575-8. Epub 2019 Oct 7.

Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

Affiliations

Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

Tristan Bepler et al. Nat Methods. 2019 Nov.

Abstract

Cryo-electron microscopy is a popular method for the determination of protein structures; however, identifying a sufficient number of particles for analysis can take months of manual effort. Current computational approaches find many false positives and require ad hoc postprocessing, especially for unusually shaped particles. To address these shortcomings, we develop Topaz, an efficient and accurate particle-picking pipeline using neural networks trained with a general-purpose positive-unlabeled learning method. This framework enables particle detection models to be trained with few sparsely labeled particles and no labeled negatives. Topaz retrieves many more real particles than conventional picking methods while maintaining low false-positive rates, is capable of picking challenging unusually shaped proteins (for example, small, non-globular and asymmetric particles), produces more representative particle sets and does not require post hoc curation. We demonstrate the performance of Topaz on two difficult datasets and three conventional datasets. Topaz is modular, standalone, free and open source ( http://topaz.csail.mit.edu ).

PubMed Disclaimer

Conflict of interest statement

Competing financial interests

The authors declare no competing financial interests.

Figures

Figure 1 |
Figure 1 |
Topaz particle picking pipeline using CNNs trained with positive and unlabeled data. (a) Given a set of labeled particles, a CNN is trained to classify positive and negative regions using particle locations as positive regions and all other regions as unlabeled. Labeled particles from EMPIAR-10096 are indicated by blue circles and a few positive and unlabeled regions are depicted. (b) Once the CNN classifier is trained, particles are predicted in two steps. First, the classifier is applied to each micrograph region to give per region predictions. Second, coordinates are extracted from the region predictions using non-maximum suppression. The left image shows a raw micrograph from EMPIAR-10096. The middle image depicts the micrograph with overlaid region predictions [blue = low confidence, red = high confidence]. The right image indicates predicted particles after using non-maximum suppression on the region predictions.
Figure 2 |
Figure 2 |
Reconstructions of the Toll receptor using particles picked by Topaz, template-based (Template), and DoG methods. Template and DoG particles were filtered through multiple rounds of 2D classification before analysis. Topaz particles were not filtered. (a) Density map using particles picked with Topaz. The global resolution is 3.70 Å at FSC0.143 with a sphericity of 0.731. (b) Density map using particles picked using template picking. The global resolution is 3.92 Å at FSC0.143 with a sphericity of 0.706. (c) Density map using particles picked using difference of Gaussians (DoG). The global resolution is 3.86 Å at FSC0.143 with a sphericity of 0.652. (d) Quantification of picked particles for each protein view based on 2D classification. (e) Example micrograph (representative of >100 micrographs examined) showing Topaz picks (red circles) and protein aggregation (outlined in green). Scale bar for the top of (a) is 5 nm.
Figure 3 |
Figure 3 |
Single particle reconstructions from published particles, Topaz particles, and Topaz particles with published particles removed (left to right). Below each reconstruction is the corresponding 3DFSC plot. (a) T20S proteasome (EMPIAR-10025) using the provided aligned, dose-weighted micrographs. (b) 80S ribosome (EMPIAR-10028). (c) Rabbit muscle aldolase (EMPIAR-10215). Scale bars: 3 nm
Figure 4 |
Figure 4 |
Reconstruction resolution and 2D class averages for Topaz particles at decreasing log-likelihood ratio thresholds. (a) Number of particles vs. reconstruction resolution for Topaz particles (increasing number of particles corresponds to decreasing log-likelihood threshold) and randomly sampled subsets of the published particle set. Resolution is as reported by cryoSPARC. For the published particle sets the mean of three replicates is marked with standard deviation shaded in grey. (b) Stacked bar plots show the quantification of the number of true and false positives at each threshold based on 2D class averages. Decreasing threshold corresponds to increasing number of predicted particles. True positives are colored in blue and false positives in orange. (c) 2D class averages obtained at each score threshold for the T20S proteasome (EMPIAR-10025). Number of particles (ptcls) and effective sample size (ess) for each class are reported by cryoSPARC. NaN is reported for classes without any particles assigned. Classes determined to be false positives are marked with orange boxes. Several classes which appear to be false positives at high score thresholds do not contain any particles and, therefore, are not highlighted.
Figure 5 |
Figure 5 |
Comparison of models trained using different objective functions with varying numbers of labeled positives on the EMPIAR-10096 and EMPIAR-10234 datasets. (a) Plots show the mean and standard deviation of the average-precision score for predicting positive regions in the EMPIAR-10096 and EMPIAR-10234 test set micrographs for models trained using either the naive PN, Kiryo et al.’s non-negative risk estimator (PU), our GE-KL, or our GE-binomial objective function. Each number of labeled positives was sampled 10 times independently. (*) indicates experiments in which GE-binomial achieved higher average-precision than GE-KL with p < 0.05. (†) indicates experiments in which GE-KL achieved higher average-precision than GE-binomial with p < 0.05 according to a two-sided dependent t-test. (b) Plots show the mean and standard deviation of the average-precision score for models trained jointly with autoencoders with different reconstruction loss weights (γ). γ=0 corresponds to training the classifier without the autoencoder. γ=10/N means the reconstruction loss is weighted by 10 divided by the number of labeled positives used to train the model.

Similar articles

Cited by

References

    1. Cheng Y, Grigorieff N, Penczek PA & Walz T A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015). - PMC - PubMed
    1. Stagg SM, Noble AJ, Spilman M & Chapman MS ResLog plots as an empirical metric of the quality of cryo-EM reconstructions. J. Struct. Biol 185, 418–426 (2014). - PMC - PubMed
    1. Rosenthal PB & Henderson R Optimal Determination of Particle Orientation, Absolute Hand, and Contrast Loss in Single-particle Electron Cryomicroscopy. J. Mol. Bio 333, 721–745 (2003). - PubMed
    1. Scheres SHW Semi-automated selection of cryo-EM particles in RELION-1.3. J. Struct. Biol 189, 114–122 (2015). - PMC - PubMed
    1. Tang G et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol 157, 38–46 (2007). - PubMed

Methods-only References

    1. Campbell MG, Veesler D, Cheng A, Potter CS & Carragher B 2.8 Å resolution reconstruction of the Thermoplasma acidophilum 20S proteasome using cryo-electron microscopy. Elife 4, (2015). - PMC - PubMed
    1. Wong W et al. Cryo-EM structure of the Plasmodium falciparum 80S ribosome bound to the anti-protozoan drug emetine. Elife 3, (2014). - PMC - PubMed
    1. Tan YZ et al. Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nat. Methods 14, 793–796 (2017). - PMC - PubMed
    1. Xu H et al. Structural Basis of Nav1.7 Inhibition by a Gating-Modifier Spider Toxin. Cell 176, 702–715 (2019). - PubMed
    1. Ioffe S & Szegedy C Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. in International Conference on Machine Learning 448–456 (2015).

Publication types

MeSH terms