Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 14;116(23):6654-64.
doi: 10.1021/jp211052j. Epub 2012 Feb 13.

Further evidence for the likely completeness of the library of solved single domain protein structures

Affiliations

Further evidence for the likely completeness of the library of solved single domain protein structures

Jeffrey Skolnick et al. J Phys Chem B. .

Abstract

Recent studies questioned whether the Protein Data Bank (PDB) contains all compact, single domain protein structures. Here, we show that all quasi-spherical, QS, random protein structures devoid of secondary structure are in the PDB and are excellent templates for all native PDB proteins up to 250 residues. Because QS templates have a similar global contour as native, TASSER can refine 98% (90%) of those whose TM-score is 0.4 (0.35) to structures greater than or equal to the 0.5 TM-score threshold (0.74 (0.64) mean TM-score) for CATH/SCOP assignment. On the basis of this and the fact that, at a TM-score of 0.4, 83% (90%) of all (internal) core secondary structure elements are recovered, a 0.40 TM-score is an appropriate fold similarity assignment threshold. Despite the claims of Taylor, Trovato, and Zhou that many of their structures lack a PDB counterpart, using fr-TM-align, at a 0.45 (0.5) TM-score threshold, essentially all (most) are found in the PDB. Thus, the conclusion that the PDB is likely complete is further supported.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The upper panel shows the structures of the QS protein of the same length as 3hkxa and the target protein 101m_; the middle panel shows the smoothed structures generated by the application of eq. 2, with the left hand panel showing the aligned 3hkxa regions to 101m_. The lower panel shows the structural superposition with the target (template) indicated by the thick(thin) tube). The TM-score of the target template alignment of QS template to 101m_ is 0.47; whereas for the smoothed pair of structures, their TM-score is 0.55.
Figure 2
Figure 2
For the top 100 structural alignments of the PDB200 set to the PDB set, the fraction of aligned target secondary structural elements fsec given by eq. 4a (dashed line) and the fraction of internal aligned secondary structural elements fsecint (solid line) given by eq. 4b (PDB internal) versus the TM-score of the template structure to the native target.
Figure 3
Figure 3
Comparison of the fraction of structurally aligned residues in the regular core secondary structure regions (black) and loop regions (red) for the PDB200 (dashed lines) and smoothed PDB200sm (solid lines) sets to the top 100 structures in the PDBsm set.
Figure 4
Figure 4
Cumulative fraction of target structures whose best QS template (dot-dot-dashed), smoothed (dashed) and first ranked TASSER model has a TM score ≥ value on the abscissa.
Figure 5
Figure 5
For the PDB250 target set and the QS300 template set: Upper panel: Comparison of the TM-score to native of the QS-sm template to that of the corresponding QS template to native. Middle panel: Comparison of the TM-score to native of the TASSER model to that of corresponding QS template to native. Lower Panel: Comparison of the TM-score to native of the TASSER model to that of corresponding QS-sm template to native.
Figure 6
Figure 6
For a given initial TM-score of the best QS300 template to the PDB250 structure (indicated by both the figure legend and the dashed line), the cumulative fraction of targets whose top (first ranked) TASSER model's TM score ≥ the value on the abscissa. The TM score threshold of 0.40 is indicated by the dotted line. In the bottom right hand panel, for an initial TM score of 0.45, we employ the same convention as above, but now in addition, for an initial QS best template TM-score of 0.49 in the dot dashed line, we show the cumulative fraction of targets whose first ranked TASSER model has a TM score ≥ value on the abscissa.
Figure 7
Figure 7
A. Cumulative fraction of QS200, QS200sm, Taylor, Trovato, Zhou PDB200, PDB200sm targets whose TM-score ≥ abscissa for the templates in the PDB library. B. Same target sets as in A but using the PDB300 library as templates.
Figure 8
Figure 8
In the top, middle and lower panel for the Taylor, Zhou and QS200 sets, the cumulative fraction of targets that have a match to the PDB and PDB300 template library as a function of TM-score. Also shown are the cumulative fractions of targets that have a best TM-score template for the contour smoothed targets and templates as indicated by PDBsm and PDB300sm.
Figure 9
Figure 9
Fraction of unmatched targets in the Taylor, Zhou and QS sets to the PDB library as a function of the number of target protein residues. Red (black) indicates a TM-score threshold of 0.45 (0.50). Dashed (solid) lines are for the original (smoothed) structure.
Figure 10
Figure 10
Example of a significant structural match to permuted 1a8la1 structure (1a8la1_46) identified in Dai and Zhou (31) as lacking a match in the PDB. Left hand side: the TM-score of 1rw8A to 1al8a1_46 is 0.50. Right hand side, structural alignment of the smoothed 1al8a1_46 to 1rw8A; the corresponding TM-score is 0.52.

Similar articles

Cited by

References

    1. Chothia C, Finkelstein AV. Annu Rev Biochem. 1990;59:1007–1039. - PubMed
    1. Kihara D, Skolnick J. J Mol Biol. 2003;334:793–802. - PubMed
    1. Zhang Y, Skolnick J. Proc Natl Acad Sci U S A. 2005;102:1029–1034. - PMC - PubMed
    1. Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J. Proc Natl Acad Sci U S A. 2006;103:2605–2610. - PMC - PubMed
    1. Skolnick J, Arakaki AK, Lee SY, Brylinski M. Proc Natl Acad Sci U S A. 2009;106:15690–15695. - PMC - PubMed

Publication types