Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 Apr 1;30(7):1427-64.
doi: 10.1093/nar/30.7.1427.

Comparative genomics and evolution of proteins involved in RNA metabolism

Affiliations
Comparative Study

Comparative genomics and evolution of proteins involved in RNA metabolism

Vivek Anantharaman et al. Nucleic Acids Res. .

Abstract

RNA metabolism, broadly defined as the compendium of all processes that involve RNA, including transcription, processing and modification of transcripts, translation, RNA degradation and its regulation, is the central and most evolutionarily conserved part of cell physiology. A comprehensive, genome-wide census of all enzymatic and non-enzymatic protein domains involved in RNA metabolism was conducted by using sequence profile analysis and structural comparisons. Proteins related to RNA metabolism comprise from 3 to 11% of the complete protein repertoire in bacteria, archaea and eukaryotes, with the greatest fraction seen in parasitic bacteria with small genomes. Approximately one-half of protein domains involved in RNA metabolism are present in most, if not all, species from all three primary kingdoms and are traceable to the last universal common ancestor (LUCA). The principal features of LUCA's RNA metabolism system were reconstructed by parsimony-based evolutionary analysis of all relevant groups of orthologous proteins. This reconstruction shows that LUCA possessed not only the basal translation system, but also the principal forms of RNA modification, such as methylation, pseudouridylation and thiouridylation, as well as simple mechanisms for polyadenylation and RNA degradation. Some of these ancient domains form paralogous groups whose evolution can be traced back in time beyond LUCA, towards low-specificity proteins, which probably functioned as cofactors for ribozymes within the RNA world framework. The main lineage-specific innovations of RNA metabolism systems were identified. The most notable phase of innovation in RNA metabolism coincides with the advent of eukaryotes and was brought about by the merge of the archaeal and bacterial systems via mitochondrial endosymbiosis, but also involved emergence of several new, eukaryote-specific RNA-binding domains. Subsequent, vast expansions of these domains mark the origin of alternative splicing in animals and probably in plants. In addition to the reconstruction of the evolutionary history of RNA metabolism, this analysis produced numerous functional predictions, e.g. of previously undetected enzymes of RNA modification.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Absolute counts of proteins containing domains involved in RNA metabolism in completely sequenced genomes. (A) RNA metabolism enzymes found in all three superkingdoms. (B) Enzymes of RNA metabolism restricted to one or two superkingdoms. (C) RNA-binding and interaction domains found two or three superkingdoms. (D) RBDs restricted to eukaryotes. For species abbreviations, see Materials and Methods. The domain names/acronyms are as in Table 1.
Figure 1
Figure 1
Absolute counts of proteins containing domains involved in RNA metabolism in completely sequenced genomes. (A) RNA metabolism enzymes found in all three superkingdoms. (B) Enzymes of RNA metabolism restricted to one or two superkingdoms. (C) RNA-binding and interaction domains found two or three superkingdoms. (D) RBDs restricted to eukaryotes. For species abbreviations, see Materials and Methods. The domain names/acronyms are as in Table 1.
Figure 1
Figure 1
Absolute counts of proteins containing domains involved in RNA metabolism in completely sequenced genomes. (A) RNA metabolism enzymes found in all three superkingdoms. (B) Enzymes of RNA metabolism restricted to one or two superkingdoms. (C) RNA-binding and interaction domains found two or three superkingdoms. (D) RBDs restricted to eukaryotes. For species abbreviations, see Materials and Methods. The domain names/acronyms are as in Table 1.
Figure 1
Figure 1
Absolute counts of proteins containing domains involved in RNA metabolism in completely sequenced genomes. (A) RNA metabolism enzymes found in all three superkingdoms. (B) Enzymes of RNA metabolism restricted to one or two superkingdoms. (C) RNA-binding and interaction domains found two or three superkingdoms. (D) RBDs restricted to eukaryotes. For species abbreviations, see Materials and Methods. The domain names/acronyms are as in Table 1.
Figure 2
Figure 2
An evolutionary scheme for Rossmann-fold RNA methylases. The conserved families and their probable temporal points of origin are shown for each of the major lineages. Dotted ellipses indicate the general phylogenetic affinities whose branching points could not be more precisely assigned, and the dotted lines indicate temporal uncertainty regarding the point of emergence. The gray line indicates related DNA methylase lineages, which are not shown in detail. Any special domain architecture present in a given lineage is shown on the top. The motif at the end of strand 4 is shown at the first branchpoint of each major family. The methylase domains in the domain architectures are given respective family names. The lines are colored to indicate the phyletic pattern of the corresponding families: A, archaea (brown); B, bacteria (blue); E, PA, An or Ver, eukaryotes, plants and animals, animals, or vertebrates (green); AB, archaeo-bacterial (purple); AE, archaeao-eukaryotic (black); BE, bacterio-eukaryotic (yellow); and C, conserved (universal) (red). These phyletic pattern abbreviations are additionally shown next to the family names and are underlined. The family names are colored according to the function: black, modification; red, PTGR; blue, capping. The domain names/acronyms are as explained in Table 1; additionally, X is a possible OB fold domain specific to Trm5; G is G-Patch domain and ‘In. capping’ is an inactive mRNA capping enzyme.
Figure 3
Figure 3
An evolutionary scheme for other RNA modification enzymes. (A) Thiouridine synthases; the PP-loop ATPases not involved in RNA processing are shown in gray. (B) tRNA-dihydrouridine synthases. (C) Deaminases. (D) Pseudouridine synthases I and II. (E) Archaeosine/queuine synthases (transglycosylases). The conventions for color-coding and line patterns are the same as in Figure 2. The domain names/acronyms are: Rhod, rhodanese domain; DHFR, dihydrofolate reductase; Uridine K, uridine kinase; wHTH, winged helix–turn–helix domain; ZR, zinc ribbon; and Crich, cysteine rich. The remaining designations are as in Table 1.
Figure 4
Figure 4
An evolutionary scheme for RNA helicases and related ATPases. The family names are colored according to their function: black, splicing and processing; red, PTGR; orange, translation. The lineages containing DNA helicases are shown in gray. The conventions for color-coding and line patterns are the same as in Figure 2. The domain names/acronyms are as explained in Table 1. Additionally: ank, ankyrin repeats; Crich and Crich2, different lineage-specific cysteine-rich domains; RQC, C-terminal domain of RecQ family helicases; Zr, zinc ribbon; and HRD, the C-terminal domain common to RecQ and RNase D.
Figure 5
Figure 5
A Venn diagram of phyletic patterns of RNA metabolism systems. The number of orthologous groups of proteins detected in each lineage is shown according to their functions. Each number in a given compartment of a Venn diagram is exclusive of the numbers in the other compartments. The number in the intersection of two circles is the number of orthologous groups shared by the two lineages (e.g. 20 groups in the AB compartment of the overall counts represents the 20 orthologous groups shared by the archaeal and bacterial lineages), whereas the intersection of three circles shows the number of orthologous groups shared by three lineages. A, archaea; B, bacteria; E, eukaryotes; P, plants; An, animals; F, fungi; C, C.elegans; D, D.melanogaster; H, H.sapiens.
Figure 6
Figure 6
An evolutionary scenario for the PTGR systems. The architectures and names of proteins involved in PTGR are shown attached to the nodes, at which they are inferred to have been derived. Architectures with stand-alone domains and repeats of a single domain are shown by names; architectures are separated by semicolons. In some cases, the details of phyletic patterns are indicated as: PA, plant and animals; AF, animals and fungi; CH, nematodes and vertebrates. When a given architecture is represented in more than one orthologous group, the number of groups is given alongside the name (bold and underlined numbers denote distinct, distantly related groups; plain numbers denote recently diverged paralogous groups). The range of numbers under a repeated domain (example 7..10) denotes the range seen in individual proteins of the corresponding group. The domain names/acronyms are as given in Table 1. Additionally: S1L, S1-like; MI, MA-3 and eIF4G; Cr3, cysteine-rich domain with three cysteines; Crich, Crich1, Crich2, different lineage-specific cysteine-rich domains; RQC, C-terminal domain of RecQ family helicases; ank, ankyrin repeats; SAM, S-adenosylmethionine-binding domain; TPR, tetratricopeptide repeats; FHA, fork head-associated domain; RNA Lig, RNA ligase; F, F box; M-bet-lac, metallo-β-lactamase; DHH, hydrolase of the DHH family; Corn, cornase domain; HD, hydrolase of the HD superfamily; ZR, zinc ribbon; 2C, an uncharacterized domain with two conserved cysteines; X, X1, different lineage-specific uncharacterized domains. The boxes are marked: C, ancient conserved (universal); A, archaea; B, bacteria; AE, archaeo-eukaryotic; E, eukaryotes; P, plants; CDH, animals; F, fungi; Ce, nematodes; D, arthropods; H, vertebrates; and DH, arthropods and vertebrates.
Figure 7
Figure 7
Architectural diversity and points of origin of the splicing machinery components. The conventions for the representation of architecture, points of derivation and lineage abbreviations are as in Figure 6. The domain names/acronyms are as given in Table 1. Additionally: G, G-Patch domain; WW, conserved domain with two characteristic tryptophans; Sp.Ht., specialized HEAT repeats; WD, WD40 repeats; Crich, lineage-specific cysteine-rich domain; At_X, an uncharacterized domain expanded in Arabidopsis; ‘Inac. Lar.’, inactive lariat-debranching enzyme; ZR, zinc ribbon; HIT, histidine triad domain.
Figure 8
Figure 8
Architectural diversity of proteins that link RNA metabolism with protein degradation and folding–unfolding. The conventions for the representation of architecture, points of derivation and lineage abbreviations are as in Figure 6. Protein functions are shown in bold type and explained in the key. The domain names/acronyms are as given in Table 1. Additionally: Ubq, ubiquitin; mpase, metalloprotease; OTA20, OTU-A20-like protease; WD, WD40 repeats; UBC-HD, ubiquitin C-terminal hydrolase; RF, RING finger; Ub-Znf, ubiquitin-specific Zn-finger; UBA, ubiquitin-associated domain; ‘acyl. ph.’, acyl phosphatase; ZR, zinc ribbon; X, a lineage-specific uncharacterized domain.

Similar articles

Cited by

References

    1. Alberts B., Bray,D., Lewis,J., Raff,M., Roberts,K. and Watson,J.D. (1999) Molecular Biology of the Cell. Garland Publishing, New York, NY.
    1. Crick F.H.C. (1958) Symp. Soc. Exp. Biol. XII, 139–163.
    1. Reinhart B.J., Slack,F.J., Basson,M., Pasquinelli,A.E., Bettinger,J.C., Rougvie,A.E., Horvitz,H.R. and Ruvkun,G. (2000) The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature, 403, 901–906. - PubMed
    1. Erdmann V.A., Szymanski,M., Hochberg,A., de Groot,N. and Barciszewski,J. (1999) Collection of mRNA-like non-coding RNAs. Nucleic Acids Res., 27, 192–195. - PMC - PubMed
    1. Franke A. and Baker,B.S. (1999) The rox1 and rox2 RNAs are essential components of the compensasome, which mediates dosage compensation in Drosophila. Mol. Cell, 4, 117–122. - PubMed

Publication types