Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Jun 12:8:163.
doi: 10.1186/1471-2164-8-163.

The relationships between the isoelectric point and: length of proteins, taxonomy and ecology of organisms

Affiliations
Comparative Study

The relationships between the isoelectric point and: length of proteins, taxonomy and ecology of organisms

Joanna Kiraga et al. BMC Genomics. .

Abstract

Background: The distribution of isoelectric point (pI) of proteins in a proteome is universal for all organisms. It is bimodal dividing the proteome into two sets of acidic and basic proteins. Different species however have different abundance of acidic and basic proteins that may be correlated with taxonomy, subcellular localization, ecological niche of organisms and proteome size.

Results: We have analysed 1784 proteomes encoded by chromosomes of Archaea, Bacteria, Eukaryota, and also mitochondria, plastids, prokaryotic plasmids, phages and viruses. We have found significant correlation in more than 95% of proteomes between the protein length and pI in proteomes--positive for acidic proteins and negative for the basic ones. Plastids, viruses and plasmids encode more basic proteomes while chromosomes of Archaea, Bacteria, Eukaryota, mitochondria and phages more acidic ones. Mitochondrial proteomes of Viridiplantae, Protista and Fungi are more basic than Metazoa. It results from the presence of basic proteins in the former proteomes and their absence from the latter ones and is related with reduction of metazoan genomes. Significant correlation was found between the pI bias of proteomes encoded by prokaryotic chromosomes and proteomes encoded by plasmids but there is no correlation between eukaryotic nuclear-coded proteomes and proteomes encoded by organelles. Detailed analyses of prokaryotic proteomes showed significant relationships between pI distribution and habitat, relation to the host cell and salinity of the environment, but no significant correlation with oxygen and temperature requirements. The salinity is positively correlated with acidicity of proteomes. Host-associated organisms and especially intracellular species have more basic proteomes than free-living ones. The higher rate of mutations accumulation in the intracellular parasites and endosymbionts is responsible for the basicity of their tiny proteomes that explains the observed positive correlation between the decrease of genome size and the increase of basicity of proteomes. The results indicate that even conserved proteins subjected to strong selectional constraints follow the global trend in the pI distribution.

Conclusion: The distribution of pI of proteins in proteomes shows clear relationships with length of proteins, subcellular localization, taxonomy and ecology of organisms. The distribution is also strongly affected by mutational pressure especially in intracellular organisms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histograms of pI values at 0.1 unit intervals (left panel) and relationship between length of proteins (log L) and their pI (right panel) for selected prokaryotic proteomes with the different pI bias (b): (A) Natronomonas pharaonis DSM 2160, b = -87%. (B) Silicibacter pomeroyi DSS-3, b = -44%; (C) Ehrlichia ruminantium str. Gardel, b = 0%; (D) Mycoplasma pneumoniae M129, b = 43%; (E) Wigglesworthia glossinidia, b = 86%. Black points represent the set of acidic proteins while grey ones – the set of basic proteins. Diagrams for all analysed proteomes are available in additional data files 2, 3 and 4.
Figure 2
Figure 2
Distributions of the correlation coefficients between pI value and length of proteins calculated separately for acidic and basic sets of proteomes.
Figure 3
Figure 3
Statistical analysis of the pI bias of different groups of proteomes and their UPGMA-based clustering according to the median of the pI bias. Numbers at nodes mean the percentage support based on subsampling method and asterisks denote results of WLS-LRT/F tests (both with p < 0.001).
Figure 4
Figure 4
Statistical analysis of the pI bias of mitochondrial proteomes and their UPGMA-based clustering according to the median of the pI bias. Numbers at nodes mean the percentage support based on subsampling method and asterisks denote results of WLS-LRT/F tests (both with p < 0.001).
Figure 5
Figure 5
Ratios of the observed to expected number of proteomes in a given class of pI bias for different ecological classifications: (A) oxygen, (B) temperature, (C) salinity, (D) habitat and (E) relation to host cell.
Figure 6
Figure 6
Relationship between the pI bias and: (A) logarithm of proteome size and (B) genomic GC content for different ecological groups of prokaryotes.
Figure 7
Figure 7
Distribution of pI for three sets of proteins: present in E. coli K12 only and not in the intracellular endosymbiont (only in Ec), E. coli proteins orthologous to the intracellular bacteria [(common (Ec)], and the endosymbiont proteins orthologous to E. coli [(common (initials of endosymbiont)]. Comparisons are made for E. coli K12 with: Buchnera aphidicola str. Bp (A), Candidatus Blochmannia floridanus (B) and Wigglesworthia glossinidia (C).
Figure 8
Figure 8
Isoelectric point and GC content. (A) The relationship between the computer-generated pI bias and the genomic GC content of prokaryotic organisms. The generated pI bias is the average calculated for 100 virtual proteomes generated for each organism assuming the same length distribution of proteins as in real proteomes and the amino acid composition calculated from the base composition characteristic for the given genome. (B) The relationship between the fraction of the basic and acidic amino acids and GC content.

Similar articles

Cited by

References

    1. Link AJ, Hays LG, Carmack EB, Yates JR., 3rd Identifying the major proteome components of Haemophilus influenzae type-strain NCTC 8143. Electrophoresis. 1997;18:1314–1334. doi: 10.1002/elps.1150180808. - DOI - PubMed
    1. Link AJ, Robison K, Church GM. Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12 . Electrophoresis. 1997;18:1259–1313. doi: 10.1002/elps.1150180807. - DOI - PubMed
    1. VanBogelen RA, Abshire KZ, Moldover B, Olson ER, Neidhardt FC. Escherichia coli proteome analysis using the gene-protein database. Electrophoresis. 1997;18:1243–1251. doi: 10.1002/elps.1150180805. - DOI - PubMed
    1. Urquhart BL, Cordwell SJ, Humphery-Smith J. Comparison of Predicted and Observed Properties of Proteins Encoded in the Genome of Mycobacterium tuberculosis H37Rv. Biochem Biophys Res Commun. 1998;253:70–79. doi: 10.1006/bbrc.1998.9709. - DOI - PubMed
    1. VanBogelen RA, Schilles EE, Thomas JD, Neidhardt FC. Diagnosis of cellular states of microbial organisms using proteomics. Electrophoresis. 1999;20:2149–2159. doi: 10.1002/(SICI)1522-2683(19990801)20:11<2149::AID-ELPS2149>3.0.CO;2-N. - DOI - PubMed

Publication types