Intrinsic disorder in the Protein Data Bank
- PMID: 17206849
- DOI: 10.1080/07391102.2007.10507123
Intrinsic disorder in the Protein Data Bank
Abstract
The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only approximately 7% of proteins are observed in the corresponding PDB structures, and only approximately 25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, "Observed" (which correspond to structured regions), "Not observed" (regions with missing electron density, potentially disordered), "Uncharacterized," and "Ambiguous," depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a 'fragment' or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. "Non-observed," "Ambiguous," and "Uncharacterized" regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR(R) VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the "Observed" dataset are ordered, and that the "Not observed" regions are mostly disordered. The "Uncharacterized" regions possess some tendency toward order, whereas the predictions for the short "Ambiguous" regions are really ambiguous. Long "Ambiguous" regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be "wobbly" domains. Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset approximately 10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and approximately 40% of the proteins possess short regions (> or =10 and < 30 amino-acid long) of missing and ambiguous residues.
Similar articles
-
Resolving the ambiguity: Making sense of intrinsic disorder when PDB structures disagree.Protein Sci. 2016 Mar;25(3):676-88. doi: 10.1002/pro.2864. Epub 2016 Jan 9. Protein Sci. 2016. PMID: 26683124 Free PMC article.
-
Abundance of intrinsic disorder in protein associated with cardiovascular disease.Biochemistry. 2006 Sep 5;45(35):10448-60. doi: 10.1021/bi060981d. Biochemistry. 2006. PMID: 16939197
-
SMS: sequence, motif and structure--a database on the structural rigidity of peptide fragments in non-redundant proteins.In Silico Biol. 2006;6(3):229-35. In Silico Biol. 2006. PMID: 16922686
-
[A turning point in the knowledge of the structure-function-activity relations of elastin].J Soc Biol. 2001;195(2):181-93. J Soc Biol. 2001. PMID: 11727705 Review. French.
-
Natively disordered proteins: functions and predictions.Appl Bioinformatics. 2004;3(2-3):105-13. doi: 10.2165/00822942-200403020-00005. Appl Bioinformatics. 2004. PMID: 15693736 Review.
Cited by
-
Liquid-liquid phase separation as an organizing principle of intracellular space: overview of the evolution of the cell compartmentalization concept.Cell Mol Life Sci. 2022 Apr 20;79(5):251. doi: 10.1007/s00018-022-04276-4. Cell Mol Life Sci. 2022. PMID: 35445278 Free PMC article. Review.
-
Life in Phases: Intra- and Inter- Molecular Phase Transitions in Protein Solutions.Biomolecules. 2019 Dec 8;9(12):842. doi: 10.3390/biom9120842. Biomolecules. 2019. PMID: 31817975 Free PMC article. Review.
-
Between order and disorder in protein structures: analysis of "dual personality" fragments in proteins.Structure. 2007 Sep;15(9):1141-7. doi: 10.1016/j.str.2007.07.012. Structure. 2007. PMID: 17850753 Free PMC article.
-
Analysis of structured and intrinsically disordered regions of transmembrane proteins.Mol Biosyst. 2009 Dec;5(12):1688-1702. doi: 10.1039/B905913J. Mol Biosyst. 2009. PMID: 19585006 Free PMC article.
-
Emergence of Alternative Structures in Amyloid Beta 1-42 Monomeric Landscape by N-terminal Hexapeptide Amyloid Inhibitors.Sci Rep. 2017 Aug 30;7(1):9941. doi: 10.1038/s41598-017-10212-5. Sci Rep. 2017. PMID: 28855598 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources