Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug:92:104858.
doi: 10.1016/j.meegid.2021.104858. Epub 2021 Apr 18.

Systematizing the genomic order and relatedness in the open reading frames (ORFs) of the coronaviruses

Affiliations

Systematizing the genomic order and relatedness in the open reading frames (ORFs) of the coronaviruses

Sailen Barik. Infect Genet Evol. 2021 Aug.

Abstract

The coronaviruses (CoVs), including SARS-CoV-2, the agent of the ongoing deadly CoVID-19 pandemic (Coronavirus disease-2019), represent a highly complex and diverse class of RNA viruses with large genomes, complex gene repertoire, and intricate transcriptional and translational mechanisms. The 3'-terminal one-third of the genome encodes four structural proteins, namely spike, envelope, membrane, and nucleocapsid, interspersed with genes for accessory proteins that are largely nonstructural and called 'open reading frame' (ORF) proteins with alphanumerical designations, but not in a consistent or sequential order. Here, I report a comparative study of these ORF proteins, mainly encoded in two gene clusters, i.e. between the Spike and the Envelope genes, and between the Membrane and the Nucleocapsid genes. For brevity and focus, a greater emphasis was placed on the first cluster, collectively designated as the 'orf3 region' for ease of referral. Overall, an apparently diverse set of ORFs, such as ORF3a, ORF3b, ORF3c, ORF3d, ORF4 and ORF5, but not necessarily numbered in that order on all CoV genomes, were analyzed along with other ORFs. Unexpectedly, the gene order or naming of the ORFs were never fully conserved even within the members of one Genus. These studies also unraveled hitherto unrecognized orf genes in alternative translational frames, encoding potentially novel polypeptides as well as some that are highly similar to known ORFs. Finally, several options of an inclusive and systematic numbering are proposed not only for the orf3 region but also for the other orf genes in the viral genome in an effort to regularize the apparently confusing names and orders. Regardless of the ultimate acceptability of one system over the others, this treatise is hoped to initiate an informed discourse in this area.

Keywords: Accessory genes; Coronaviridae; Coronavirus; Open reading frame; Phylogeny; RNA genome.

PubMed Disclaimer

Conflict of interest statement

None.

Figures

Unlabelled Image
Graphical abstract
Fig. 1
Fig. 1
Sequence homology among E proteins from selected coronaviruses, confirming that the E sequences correspond to the Genera, which are color-coded for easy viewing (Alpha = red; Beta = blue; Gamma = Green; Delta = pink). The results match with the phylogeny based on CoV RdRP (Cui et al., 1994). Virus names and abbreviations are listed in Table 1 (Section 3.2). Alignment was performed as described in Materials and Methods, and the branch lengths indicated in the rooted guide tree. For consistency, these same viral strains were used as reference in all analyses in this study, using the same color codes.
Fig. 2
Fig. 2
Homology-based phylogeny in the ‘orf3 region’ of the three major CoV genera, color coded as in Fig. 1 (Alpha = red; Beta = blue; Gamma = Green). Examples of genus-specific clusters, both interrupted by another Genus, are marked by boxes. Viral name abbreviations are in Table 1, and details are in Section 3.4.
Fig. 3
Fig. 3
Comparative ORF locations in the ‘orf3 region’ of two major CoV genomes, depicting the currently used names. Virus names are as in the previous Figures. The genetic map of a CoV is schematically shown on top, not drawn to scale. When multiple translational frames were used, they were indicated by different thickness and shade; the main frame (frame 0), considered as the one used by the most 5′-proximal (left side in the diagram) orf in the genome (usually ORF3/3a; 3b, etc.) in black color, and the −1 and − 2 frames are progressively lighter. The ORF lengths were drawn approximately to scale, except when a long sequence was truncated in the middle in order to fit it in the available space. For these truncated ORFs, only the left terminus, and not the right, is properly positioned. To indicate the actual lengths, the amino acid (aa) numbers encoded in all ORFs are shown in parenthesis. ORF overlaps are indicated by placing them in different tracks (over one another); however, due to space constraints, specific tracks could not be assigned to each translational frame. Orthologous ORFs are connected by green lines and any dissimilar ORFs between them were skipped and indicated by broken line segments in black color. Further details are in Section 3.5. Note that all viruses in this Figure are also contained in Fig. 2, with the sole exception of AFCD307 as it is essentially identical to AFCD62.
Fig. 4
Fig. 4
Identity of two unnamed CoV ORFs, viz. 133-X4 (A) and Rp3-X (B). The predicted amino acid sequences were used as query in BLAST, and representative orthologous ORFs are shown in alignment. Note that the homologs are annotated with various ORF numbers in various viruses even when they are highly related, such as the 133 and HKU4 family viruses, both betacoronaviruses from bats (Table 1). The NCBI accession number of Civet010 ORF4 is AAU04651.1.
Fig. 5
Fig. 5
Phylogeny of newly mined CoV ORFs (ORFX#). The conceptually translated sequences of hitherto uncharacterized ORFs in the set of CoVs were subjected to multiple sequence alignment and the homology tree drawn as described in Materials and Methods. The two Genera are color-coded as before (Alpha = red; Beta = blue). Note that the similarity branches of the total ORFX sequences do not cluster by viral Genus, but there is a trend of local clustering within each genus.
Fig. 6
Fig. 6
Proposed new nomenclature for the orf3 region. This is drawn in the same format as Fig. 3 for easy comparison between the two, and shows the proposed new numbering of the ORFs in a subset of genomes for illustration purposes. As in Fig. 3, a schematic of the ORF locations on a generic CoV genome is shown on top, with brief rationales above it, and detailed in Section 3.6. The extra ORF numbers, although not currently required, have been added for contingency, such that if new ORFs are discovered and added to any of these regions in the future, they will not affect the downstream ORF numbers. Some current ORFs between M and N actually overlap with the N sequence, which is also indicated for ORF 13–20 in this region.

Similar articles

Cited by

References

    1. Adato O., Ninyo N., Gophna U., Snir S. Detecting horizontal gene transfer between closely related taxa. PLoS Comput. Biol. 2015;11 doi: 10.1371/journal.pcbi.1004408. - DOI - PMC - PubMed
    1. Banerjee A.K., Barik S. Gene expression of vesicular stomatitis virus genome RNA. Virology. 1992;188:417–428. doi: 10.1016/0042-6822(92)90495-b. - DOI - PubMed
    1. Banerjee A.K., Barik S., De B.P. Gene expression of nonsegmented negative strand RNA viruses. Pharmacol. Ther. 1991;51:47–70. doi: 10.1016/0163-7258(91)90041-j. - DOI - PubMed
    1. Baranov P.V., Henderson C.M., Anderson C.B., Gesteland R.F., Atkins J.F., Howard M.T. Programmed ribosomal frameshifting in decoding the SARS-CoV genome. Virology. 2005;332:498–510. doi: 10.1016/j.virol.2004.11.038. - DOI - PMC - PubMed
    1. Barik S. Respiratory syncytial virus mechanisms to interfere with type 1 interferons. Curr. Top. Microbiol. Immunol. 2013;372:173–191. doi: 10.1007/978-3-642-38919-1_9. - DOI - PubMed