Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Dec;34(12):1060-9.
doi: 10.1002/bies.201200116. Epub 2012 Oct 26.

When a domain is not a domain, and why it is important to properly filter proteins in databases: conflicting definitions and fold classification systems for structural domains make filtering of such databases imperative

Affiliations

When a domain is not a domain, and why it is important to properly filter proteins in databases: conflicting definitions and fold classification systems for structural domains make filtering of such databases imperative

Clare-Louise Towse et al. Bioessays. 2012 Dec.

Abstract

Membership in a protein domain database does not a domain make; a feature we realized when generating a consensus view of protein fold space with our consensus domain dictionary (CDD). This dictionary was used to select representative structures for characterization of the protein dynameome: the Dynameomics initiative. Through this endeavor we rejected a surprising 40% of the 1,695 folds in the CDD as being non-autonomous folding units. Although some of this was due to the challenges of grouping similar fold topologies, the dissonance between the cataloguing and structural qualification of protein domains remains surprising. Another potential factor is previously overlooked intrinsic disorder; predictions suggest that 40% of proteins have either local or global disorder. One thing is clear, filtering a structural database and ensuring a consistent definition for protein domains is crucial, and caution is prescribed when generalizations of globular domains are drawn from unfiltered protein domain datasets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Variations in protein domains within a protein fold family and variations between different fold families. A: The ferredoxin-like fold family with α-helices in red and β-strands in green. The structural elements shared by different members of the family are highlighted, with non-consensus structure in gray. B: Representative structures of the Top 30 most populated fold families from the Dynameomics project colored from N- to C-terminus (blue to red).
Figure 2
Figure 2
The steps taken in creation of the v2009 CDD and the selection of the Dynameomics targets with the breakdown of the rejected targets detailed. Rejected targets classed as domains but rejected for quality or simulation constraints are boxed in green; those not considered to be domains at that time are boxed in red. The non-autonomous, irregular structures, unstable simulations and crystal structures with coordinate gaps, all less than 450 residues in length, form the set of 755 rejected targets that were surveyed for the possibility of disordered regions.
Figure 3
Figure 3
Metafold representatives with putative or confirmed disordered regions that were rejected from the Dynameomics project due to simulation instability. Targets have secondary structure colored blue. The final structures showing the conformational changes post-simulation have regions highlighted in red where disorder was predicted. A: HIV protein Vpu (PDB ID: 1vpu) that was predicted both here (DISOPRED2) and previously (PONDR)[32] to be substantially disordered, with correlating variation in the 80 ns MD ensemble (10 ns snapshots) and loss of secondary structure in the final structure. Inset is the DISOPRED2 prediction, with previously predicted region also highlighted. B: Starting and final structures of five additional simulations with greater than 10 contiguous residues predicted to be disordered: 1t23; 1k0h; 7hsc; 1fu9; 1q3j.
Figure 4
Figure 4
Structures of rejected targets that were initially rejected for being irregular, non-autonomous or containing significant gaps in structures where coordinates could not be experimentally defined. PDB and DisProt codes are inset where membership applies. A: Irregular metafolds with secondary structure colored in blue. Disordered regions mostly coincide with the coil regions colored gray and are not highlighted. B: Metafolds with missing coordinates in X-ray structures, secondary structure is colored blue with disordered regions highlighted in red. Where the disorder pertains to a missing region, the gap is marked with a dashed line. C: Non-autonomous or discontinuous metafolds. The target that is disordered in isolation or incapable of autonomous folding is colored purple, with interrupting structure or extraneous members of a complex shaded in gray. Predicted or experimentally confirmed disordered regions are highlighted in red, where this correlates with missing coordinates a dashed line is used to denote the gap.

Similar articles

Cited by

References

    1. Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, et al. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res. 2007;35:D237–40. - PMC - PubMed
    1. Kendrew J, Bodo G, Dintzis H, Parrish R, et al. A three-dimensional model of the myoglobin molecule obtained by X-ray analysis. Nature. 1958;181:662–6. - PubMed
    1. Perutz MF, Rossmann MG, Cullis AF, Muirhead H, et al. Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-Å resolution, obtained by X-ray analysis. Nature. 1960;185:416–22. - PubMed
    1. Levitt M, Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–8. - PubMed
    1. Grant A, Lee D, Orengo C. Progress towards mapping the universe of protein folds. Genome Biol. 2004;5:107. - PMC - PubMed

Publication types