Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 20;12(4):462.
doi: 10.3390/v12040462.

Synonymous Dinucleotide Usage: A Codon-Aware Metric for Quantifying Dinucleotide Representation in Viruses

Affiliations

Synonymous Dinucleotide Usage: A Codon-Aware Metric for Quantifying Dinucleotide Representation in Viruses

Spyros Lytras et al. Viruses. .

Abstract

Distinct patterns of dinucleotide representation, such as CpG and UpA suppression, are characteristic of certain viral genomes. Recent research has uncovered vertebrate immune mechanisms that select against specific dinucleotides in targeted viruses. This evidence highlights the importance of systematically examining the dinucleotide composition of viral genomes. We have developed a novel metric, called synonymous dinucleotide usage (SDU), for quantifying dinucleotide representation in coding sequences. Our method compares the abundance of a given dinucleotide to the null hypothesis of equal synonymous codon usage in the sequence. We present a Python3 package, DinuQ, for calculating SDU and other relevant metrics. We have applied this method on two sets of invertebrate- and vertebrate-specific flaviviruses and rhabdoviruses. The SDU shows that the vertebrate viruses exhibit consistently greater under-representation of CpG dinucleotides in all three codon positions in both datasets. In comparison to existing metrics for dinucleotide quantification, the SDU allows for a statistical interpretation of its values by comparing it to a null expectation based on the codon table. Here we apply the method to viruses, but coding sequences of other living organisms can be analysed in the same way.

Keywords: CpG suppression; Flaviviridae; Rhabdoviridae; bioinformatics; dinucleotides; python package; synonymous codon usage.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 1
Figure 1
Comparison of error for the SDUCpGbridge of 10 simulated amino acid sequences of different lengths with 1000 random samples of nucleotide sequences for each amino acid sequence: (A) Standard deviation of the mean of the SDU error distributions; (B) Violin plots of the error distribution for each simulated sequence.
Figure 2
Figure 2
RDA (a), SDU (b) and RSDU (c) values for all informative dinucleotides and frame positions plotted for the APOIV coding sequence. Dot points indicate observed values and violin plots indicate SDU/RSDU error distributions around the null hypothesis (1000 random samples for each value). The grey horizontal line indicates an RDA of 1. Position 1 of dinucleotides CpC, CpA, GpC, GpG, GpU, GpA, UpG, UpA, ApC, ApU, ApA are excluded because they can only produce one amino acid (non-informative).
Figure 3
Figure 3
RDA (a), SDU (b) and RSDU (c) values for all informative dinucleotides and frame positions plotted for the AEFV coding sequence. Dot points indicate observed values and violin plots indicate SDU/RSDU error distributions around the null hypothesis (1000 random samples for each value). The grey horizontal line indicates an RDA of 1. Position 1 of dinucleotides CpC, CpA, GpC, GpG, GpU, GpA, UpG, UpA, ApC, ApU, ApA are excluded because they can only produce one amino acid (non-informative).
Figure 4
Figure 4
Comparison of SDUCpG for all frame positions between invertebrate- and vertebrate-specific (A) Flaviviridae and (B) Rhabdoviridae, plotted against the overall GC content of the coding sequences (Supplementary Table S1).

Similar articles

Cited by

References

    1. Beutler E., Gelbart T., Han J.H., Koziol J.A., Beutler B. Evolution of the genome and the genetic code: Selection at the dinucleotide level by methylation and polyribonucleotide cleavage. Proc. Natl. Acad. Sci. USA. 1989;86:192–196. doi: 10.1073/pnas.86.1.192. - DOI - PMC - PubMed
    1. Karlin S., Doerfler W., Cardon L.R. Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses? J. Virol. 1994;68:2889–2897. doi: 10.1128/JVI.68.5.2889-2897.1994. - DOI - PMC - PubMed
    1. Cheng X., Virk N., Chen W., Ji S., Ji S., Sun Y., Wu X. CpG Usage in RNA Viruses: Data and Hypotheses. PLoS ONE. 2013;8:e74109. doi: 10.1371/journal.pone.0074109. - DOI - PMC - PubMed
    1. Bird A. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980;8:1499–1504. doi: 10.1093/nar/8.7.1499. - DOI - PMC - PubMed
    1. Cooper D.N., Krawczak M. Cytosine methylation and the fate of CpG dinucleotides in vertebrate genomes. Qual. Life Res. 1989;83:181–188. doi: 10.1007/BF00286715. - DOI - PubMed

Publication types

LinkOut - more resources