Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage

doi:10.1016/s0022-2836(03)00865-9

Comparative Study

. 2003 Aug 29;331(5):991-1004.

doi: 10.1016/s0022-2836(03)00865-9.

Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage

Eric J Snijder¹, Peter J Bredenbeek, Jessika C Dobbe, Volker Thiel, John Ziebuhr, Leo L M Poon, Yi Guan, Mikhail Rozanov, Willy J M Spaan, Alexander E Gorbalenya

Affiliations

PMID: 12927536
PMCID: PMC7159028
DOI: 10.1016/s0022-2836(03)00865-9

Comparative Study

Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage

Eric J Snijder et al. J Mol Biol. 2003.

. 2003 Aug 29;331(5):991-1004.

doi: 10.1016/s0022-2836(03)00865-9.

Authors

Eric J Snijder¹, Peter J Bredenbeek, Jessika C Dobbe, Volker Thiel, John Ziebuhr, Leo L M Poon, Yi Guan, Mikhail Rozanov, Willy J M Spaan, Alexander E Gorbalenya

Affiliation

¹ Department of Medical Microbiology, Leiden University Medical Center, Room L4-34, Albinusdreef 2, PO Box 9600, 2300 RC, Leiden, The Netherlands.

PMID: 12927536
PMCID: PMC7159028
DOI: 10.1016/s0022-2836(03)00865-9

Abstract

The genome organization and expression strategy of the newly identified severe acute respiratory syndrome coronavirus (SARS-CoV) were predicted using recently published genome sequences. Fourteen putative open reading frames were identified, 12 of which were predicted to be expressed from a nested set of eight subgenomic mRNAs. The synthesis of these mRNAs in SARS-CoV-infected cells was confirmed experimentally. The 4382- and 7073 amino acid residue SARS-CoV replicase polyproteins are predicted to be cleaved into 16 subunits by two viral proteinases (bringing the total number of SARS-CoV proteins to 28). A phylogenetic analysis of the replicase gene, using a distantly related torovirus as an outgroup, demonstrated that, despite a number of unique features, SARS-CoV is most closely related to group 2 coronaviruses. Distant homologs of cellular RNA processing enzymes were identified in group 2 coronaviruses, with four of them being conserved in SARS-CoV. These newly recognized viral enzymes place the mechanism of coronavirus RNA synthesis in a completely new perspective. Furthermore, together with previously described viral enzymes, they will be important targets for the design of antiviral strategies aimed at controlling the further spread of SARS-CoV.

PubMed Disclaimer

Figures

**Figure 1**
Overview of the SARS-CoV genome organization and expression. Comparison of the genome organizations of SARS-CoV and bovine coronavirus (BCoV). The replicase genes are depicted, with ORF1a, ORF1b, and ribosomal frameshift site indicated. Arrows represent sites in the corresponding replicase polyproteins that are cleaved by papain-like proteinases (orange) or the 3C-like cysteine proteinase (blue). Cleavage products are provisionally numbered nsp1–nsp16 (see also Table 1). In the 3′-terminal part of the genomes, homologous structural protein genes are indicated in matching colors. Close-ups of two regions with major differences are shown (and see the text). In the N-terminal half of replicase ORF1a, SARS-CoV lacks one of the PL^pro domains (indicated in orange/green in BCoV) and contains a unique insertion (SUD). In the region with structural and accessory protein genes, the location of the body TRSs involved in subgenomic RNA synthesis are indicated with red boxes (see Figure 3 and Hofmann *et al*.⁷⁶). The bottom part of the Figure illustrates which parts of the genome are conserved in the genus *Coronavirus* and in the order Nidovirales (the ORF1a sequence of toroviruses, which largely remains to be sequenced, could not be included). Furthermore, it is indicated for which domains homologs have been identified in other RNA viruses and the cellular world. Enzymes for which structural data are available are shown in blue. SUD, SARS-CoV unique domain; PL^pro, papainlike cysteine proteinase; 3CL^pro, 3C-like cysteine proteinase; TM, transmembrane domain; ADRP, adenosine diphosphate-ribose 1″-phosphatase; ExoN, 3′-to-5′ exonuclease; CL^pro, chymotrypsin-like proteinase; RdRp, RNA-dependent RNA polymerase; HEL1, superfamily 1 helicase; XendoU, (homolog of) poly(U)-specific endoribonuclease; 2′-O-MT, S-adenosylmethionine-dependent ribose 2′-O-methyltransferase; CPD, cyclic phosphodiesterase. Domains Ac, X, and Y are described by Ziebuhr *et al*. and Gorbalenya *et al*.

**Figure 2**
Phylogenetic analysis of coronavirus replicase genes. SARS-CoV replicase ORF1b amino acid sequences (Entrez Genomes accession number NC_004718 (AY274119)) were compared with those from viruses representing the three coronavirus subgroups and the genus *Torovirus*. Group 1: transmissible gastroenteritis virus (TGEV), NC_002306; human coronavirus 229E (HCoV-229E), NC_002645; porcine epidemic diarrhea virus (PEDV), NC_003436. Group 2: mouse hepatitis virus A59 (MHV-A59), NC_001846; bovine coronavirus (BCoV-Lun) AF391542. Group 3: infectious bronchitis virus (IBV), strains Beaudette (NC_001451) and LX4 (AY223860). Torovirus: equine torovirus (EToV), X52374. A multiple protein alignment of these sequences was generated with the help of the ClustalX1.82 program and was adjusted manually. Two regions of poor conservation were removed from the alignment, which was converted subsequently into the nucleotide form. All columns containing gaps were removed. The resulting alignment contains the following SARS-CoV sequences fused: 13,623–13,859, 14,310–18,857 and 20,076–21,482. It included 5487 characters with 3207 of them being parsimony-informative. Using the PAUP program (version 4.0.0d55) and parsimony criterion, an exhaustive tree search of the 135,135 evaluated trees identified the best tree having a score of 10,927 and the second best tree having a score of 10,964; the worst tree had a score of 13,611. A total of 1000 bootstrap trials were conducted using the parsimony criterion and a branch-and-bound search to generate a bootstrap 50% majority-rule consensus tree. The frequency of occurrence of particular bifurcations in bootstraps is indicated at the nodes. Similar trees with similar high bootstrap support above 960 were obtained using the NJ method that was applied to distance matrices obtained for either nucleotide or amino acid alignments (not shown).

**Figure 3**
SARS-CoV subgenomic mRNA synthesis. (A) Organization of ORFs in the 3′ end of the SARS-CoV genome with predicted leader and body TRSs indicated by small boxes. The subgenomic mRNAs resulting from the use of these TRSs for leader-to-body fusion are depicted below, with mRNAs predicted to be functionally bicistronic indicated with an asterisk (∗). (B) Hybridization analysis of intracellular viral RNA from Vero cells infected with SARS-CoV, Frankfurt-1 (Fr) and HKU-39849 (HK) isolates. See Materials and Methods for technical details. Oligonucleotides complementary to sequences from the SARS-CoV leader sequence and to a region in the genomic 3′ end both recognized a set of nine RNA species (the genome (RNA1) and eight subgenomic RNAs) confirming the presence of common 5′ and 3′ sequences. RNA from Vero cells infected with avian infectious bronchitis virus (IBV), which produces only five subgenomic mRNAs of known sizes was run in the same gel and used as a size marker. (C) Model for nidovirus subgenomic RNA synthesis by discontinuous extension of minus strands., Whereas genome replication relies on continuous minus strand synthesis (antigenome), subgenomic minus strands would be produced by attenuation of nascent strand synthesis at a body TRS (red bar), followed by translocation of the nascent strand to the leader TRS in the genomic template. Following base-pairing between the body TRS complement at the 3′ end of the minus strand and the leader TRS, RNA synthesis would resume to complete the subgenomic minus strand that would then serve as template for the transcription of subgenomic mRNAs.

**Figure 4**
Sequence alignments of protein families that include cellular enzymes involved in RNA processing and their nidovirus homologs. Our in-depth comparative sequence analysis (see Materials and Methods) revealed a statistically significant relationship between functionally uncharacterized proteins (domains) of nidoviruses, including SARS-CoV, and five protein families that include enzymes involved in two nuclear RNA processing pathways: intron excision to produce mature tRNA and the production of intron-encoded box C/D small nucleolar RNA (snoRNA) from its host pre-mRNA (Figure 5). Shown are alignments for key regions of a few selected members of the following groups of enzymes: (A) XendoU family; (B) ExoN family; (C) 2′-O-MT family; (D) CPD family; and (E) ADRP family. These protein families may be known also under other names. Cellular homologs, not necessarily including proteins involved in the discussed RNA processing pathways, are listed in the top segment of each alignment and nidovirus proteins in the bottom segment. In the CPD family, along with group 2 coronavirus representatives, proteins of two rotaviruses (double-stranded RNA viruses), which were identified in this study, are listed. In both segments, residues are highlighted independently: black for absolutely conserved residues and different shades of grey to indicate different levels of conservation; amino acid similarity groups used were: (i) D, E, N, Q; (ii) S, T; (iii) K, R; (iv) F, W, Y; and (v) I, L, M, V. Positions occupied by identical or similar residues in all proteins under comparison are indicated with an asterisk (∗) and colon (:), respectively, in the inter-segment row. For the ExoN family, three motifs conserved in the DEDD superfamily and Zn-finger unique for the ExoN family are indicated. Database accession numbers for nidovirus genome sequences: SARS-CoV, Entrez Genomes accession number NC_004718 (AY274119); MHV-A59, NC_001846; BCoV-Lun, AF391542; HCoV-229E, NC_002645; IBV-B, NC_001451; PEDV, NC_003436; TGEV, NC_002306; equine torovirus (EToV), X52374; equine arteritis virus (EAV), X53459; porcine reproductive and respiratory syndrome virus (PRRSV), M96262; gill-associated virus (GAV), AF227196. Abbrevations and NCBI protein database ID number or SwissProt names of the remaining protein sequences are: (A) Npun 0562, hypothetical protein of *Nostoc punctiforme*, ZP_00106190; Poliv smB, pancreatic protein of *Paralichthys olivaceus*, BAA88246; Celeg Pp11, placental protein 11-like precursor of *Caenorhabditis elegans*, NP_492590); Xlaev endoU, endoU protein of *Xenopus laevis*, CAD45344; pp1b, ORF1b-encoded part of nidovirus replicase polyprotein 1ab. (B) Yeast PAN2, PAB-dependent poly(A)-specific ribonuclease subunit PAN2 of *Saccharomyces cerevisiae*, P53010; Mycge DPO3, DNA polymerase III polC-type, containing exonuclease domain, of *Mycoplasma genitalium*, P47277; Bacsu DING, probable ATP-dependent helicase dinG homolog, containing exonuclease domain, of *Bacillus subtilis*, P54394; Ecoli DP3E, DNA polymerase III, epsilon chain, containing exonuclease domain, of *Escherichia coli*, P03007 (PDB: 1J53 and 1J54); Ecoli RNT, exoribonuclease T of *Escherichia coli*, P30014. (C) Hsap AKA, A-kinase anchoring protein 18 gamma of *Homo sapiens*, AAF28106; Athal CPD1, putative CPD1 of *Arabidopsis thaliana*, CAA16750; Athal CPD2, putative CPD2 of *Arabidopsis thaliana*, CAA16751; yeast YG59, hypothetical 26.7 kDa protein of yeast, P53314; Ecoli LIGT, 2′-5′ RNA ligase of *Escherichia coli*, P37025; ns2, non-structural protein (ORF2-encoded) of the coronaviruses HCoV-O43 (AAA74377), BCoV-Quebec (P18517), and MHV-A59 (P19738); EToV pp1a, C-terminal fragment of EToV pp1a, S11237; HRoV VP3, VP3 of human rotavirus, BAA84964; ARoV VP3, VP3 of avian rotavirus PO-13, BAA24128. (D) Ecoli o177, putative polyprotein of *Escherichia coli*, AAC74129; Hsap Y1268a, KIAA1268 protein of *Homo sapiens*, BAA86582; Hsap H2A1.1, histone macroH2A1.1 of *Homo sapiens*, AAC33434; yeast YMX7, hypothetical 32.1 kDa protein of yeast, Q04299; yeast YBN2, hypothetical 19.9 kDa protein of yeast, P38218. (E) Yeast YBR1, putative ribosomal RNA methyltransferase (rRNA (uridine-2′-O-)-methyltransferase) of yeast, P38238; yeast SPB1, putative rRNA methyltransferase SPB1 of yeast, P25582; yeast YGN6, putative ribosomal RNA methyltransferase YGL136c (rRNA (uridine-2′-O-)-methyltransferase) of yeast, P53123; Ecoli FTSJ, cell division protein of *Escherichia coli*, NP_417646.

**Figure 5**
Nidoviruses encode homologs of cellular enzymes involved in RNA processing. (A) The cellular pathways for processing of pre-U16 snoRNA and pre-tRNA splicing are summarized, with relevant enzymatic activities indicated. For details, see the text. Homologs of the highlighted enzymes have been identified in nidoviruses (see also Figure 1 and the text). (B) Table summarizing the conservation of homologs of the cellular enzymes presumably involved in RNA processing in SARS-CoV and different nidovirus groups.

See this image and copyright information in PMC

Cited by

The emerging role of SARS-CoV-2 nonstructural protein 1 (nsp1) in epigenetic regulation of host gene expression.
Ivanov KI, Yang H, Sun R, Li C, Guo D. Ivanov KI, et al. FEMS Microbiol Rev. 2024 Sep 18;48(5):fuae023. doi: 10.1093/femsre/fuae023. FEMS Microbiol Rev. 2024. PMID: 39231808 Free PMC article. Review.
Comparative Atlas of SARS-CoV-2 Substitution Mutations: A Focus on Iranian Strains Amidst Global Trends.
Abbasian MH, Rahimian K, Mahmanzar M, Bayat S, Kuehu DL, Sisakht MM, Moradi B, Deng Y. Abbasian MH, et al. Viruses. 2024 Aug 20;16(8):1331. doi: 10.3390/v16081331. Viruses. 2024. PMID: 39205305 Free PMC article.
Torsional Twist of the SARS-CoV and SARS-CoV-2 SUD-N and SUD-M domains.
Rosas-Lemus M, Minasov G, Brunzelle JS, Taha TY, Lemak S, Yin S, Shuvalova L, Rosecrans J, Khanna K, Seifert HS, Savchenko A, Stogios PJ, Ott M, Satchell KJF. Rosas-Lemus M, et al. bioRxiv [Preprint]. 2024 Aug 14:2024.08.13.607777. doi: 10.1101/2024.08.13.607777. bioRxiv. 2024. PMID: 39185168 Free PMC article. Preprint.
A genus-specific nsp12 region impacts polymerase assembly in Alpha- and Gammacoronaviruses.
Hoferle PJ, Anderson TK, Kirchdoerfer RN. Hoferle PJ, et al. bioRxiv [Preprint]. 2024 Jul 24:2024.07.23.604833. doi: 10.1101/2024.07.23.604833. bioRxiv. 2024. Update in: J Biol Chem. 2024 Sep 21;300(11):107802. doi: 10.1016/j.jbc.2024.107802. PMID: 39091740 Free PMC article. Updated. Preprint.
Deep mining of the Sequence Read Archive reveals major genetic innovations in coronaviruses and other nidoviruses of aquatic vertebrates.
Lauber C, Zhang X, Vaas J, Klingler F, Mutz P, Dubin A, Pietschmann T, Roth O, Neuman BW, Gorbalenya AE, Bartenschlager R, Seitz S. Lauber C, et al. PLoS Pathog. 2024 Apr 22;20(4):e1012163. doi: 10.1371/journal.ppat.1012163. eCollection 2024 Apr. PLoS Pathog. 2024. PMID: 38648214 Free PMC article.

See all "Cited by" articles

References

1. Peiris J.S.M., Lai S.T., Poon L.L.M., Guan Y., Yam L.Y.C., Lim W. Coronavirus as a possible cause of severe acute respiratory syndrome. Lancet. 2003;361:1319–1325. - PMC - PubMed
1. Ksiazek T.G., Erdman D., Goldsmith C.S., Zaki S.R., Peret T., Emery S. A novel coronavirus associated with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1953–1966. - PubMed
1. Drosten C., Gunther S., Preiser W., van der Werf S., Brodt H.R., Becker S. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1967–1976. - PubMed
1. Marra M.A., Jones S.J., Astell C.R., Holt R.A., Brooks-Wilson A., Butterfield Y.S. The Genome sequence of the SARS-associated coronavirus. Science. 2003;300:1399–1404. - PubMed
1. Rota P.A., Oberste M.S., Monroe S.S., Nix W.A., Campagnoli R., Icenogle J.P. Characterization of a novel coronavirus associated with severe acute respiratory syndrome. Science. 2003;300:1394–1399. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions

Associated data

Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect
- The Lens - Patent Citations
Miscellaneous
- NCI CPTAC Assay Portal

[1] Peiris J.S.M., Lai S.T., Poon L.L.M., Guan Y., Yam L.Y.C., Lim W. Coronavirus as a possible cause of severe acute respiratory syndrome. Lancet. 2003;361:1319–1325. - PMC - PubMed

[2] Peiris J.S.M., Lai S.T., Poon L.L.M., Guan Y., Yam L.Y.C., Lim W. Coronavirus as a possible cause of severe acute respiratory syndrome. Lancet. 2003;361:1319–1325. - PMC - PubMed

[3] Ksiazek T.G., Erdman D., Goldsmith C.S., Zaki S.R., Peret T., Emery S. A novel coronavirus associated with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1953–1966. - PubMed

[4] Ksiazek T.G., Erdman D., Goldsmith C.S., Zaki S.R., Peret T., Emery S. A novel coronavirus associated with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1953–1966. - PubMed

[5] Drosten C., Gunther S., Preiser W., van der Werf S., Brodt H.R., Becker S. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1967–1976. - PubMed

[6] Drosten C., Gunther S., Preiser W., van der Werf S., Brodt H.R., Becker S. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1967–1976. - PubMed

[7] Marra M.A., Jones S.J., Astell C.R., Holt R.A., Brooks-Wilson A., Butterfield Y.S. The Genome sequence of the SARS-associated coronavirus. Science. 2003;300:1399–1404. - PubMed

[8] Marra M.A., Jones S.J., Astell C.R., Holt R.A., Brooks-Wilson A., Butterfield Y.S. The Genome sequence of the SARS-associated coronavirus. Science. 2003;300:1399–1404. - PubMed

[9] Rota P.A., Oberste M.S., Monroe S.S., Nix W.A., Campagnoli R., Icenogle J.P. Characterization of a novel coronavirus associated with severe acute respiratory syndrome. Science. 2003;300:1394–1399. - PubMed

[10] Rota P.A., Oberste M.S., Monroe S.S., Nix W.A., Campagnoli R., Icenogle J.P. Characterization of a novel coronavirus associated with severe acute respiratory syndrome. Science. 2003;300:1394–1399. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage

Affiliation

Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Associated data

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Associated data

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous