Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Jun 26;52(3):1431-1447.
doi: 10.1042/BST20231322.

Viral genome sequencing methods: benefits and pitfalls of current approaches

Affiliations
Review

Viral genome sequencing methods: benefits and pitfalls of current approaches

Natasha Jansz et al. Biochem Soc Trans. .

Abstract

Whole genome sequencing of viruses provides high-resolution molecular insights, enhancing our understanding of viral genome function and phylogeny. Beyond fundamental research, viral sequencing is increasingly vital for pathogen surveillance, epidemiology, and clinical applications. As sequencing methods rapidly evolve, the diversity of viral genomics applications and catalogued genomes continues to expand. Advances in long-read, single molecule, real-time sequencing methodologies present opportunities to sequence contiguous, haplotype resolved viral genomes in a range of research and applied settings. Here we present an overview of nucleic acid sequencing methods and their applications in studying viral genomes. We emphasise the advantages of different viral sequencing approaches, with a particular focus on the benefits of third-generation sequencing technologies in elucidating viral evolution, transmission networks, and pathogenesis.

Keywords: DNA sequencing; genomics; virology.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no competing interests associated with the manuscript.

Figures

Figure 1.
Figure 1.. Principles underlying common sequencing methods.
(A) Chain termination sequencing makes use of dideoxynucleotides (ddNTPs). ddNTPs are similar in structure to deoxynucleotides (dNTPs), but lack the 3′ hydroxyl group. The ddNTPs may be radioactively or fluorescently labelled. When a ddNTP is incorporated into a DNA strand, DNA synthesis stops. In a sequencing reaction, dNTPs are present in excess and chain elongation proceeds normally until DNA polymerase adds a labelled ddNTP, arresting elongation. (B) Following the sequencing reaction, the products of varying lengths are separated by either gel (left) or capillary electrophoresis (right), and can be visualised by autoradiography or fluorescence to infer the DNA sequence. (C) Illumina dye sequencing is a second generation sequencing-by-synthesis approach that involves fragmenting DNA inputs, and ligating sequencing adapters to the ends of fragments. The fragments can then hybridise to a solid flow cell by the adapter sequences, where they are amplified into a clonal cluster, which serves as a sequencing template. The sequencing reaction includes fluorescently labelled dNTPs. As each base is incorporated into the newly synthesised strand, the flow cell is imaged, and the specific emission of each cluster recorded to identify the newly incorporated base. The fluorescently labelled nucleotide serves as a ‘reversible terminator’, as the label can be enzymatically cleaved after each sequencing reaction, enabling the next round of dNTP incorporation. (D) SMRT HiFi sequencing is a single-molecule, long-read sequencing technology. Circularised fragments of DNA are prepared and washed over a nanofluidic chip containing millions of wells called zero-mode waveguides (ZMWs). A single molecule of circularised DNA is associated with a DNA polymerase (red) and immobilised at the bottom of a ZMW. From inside the ZMW, labelled nucleotides are incorporated into a newly synthesised strand. SMRT-seq uses nucleotides containing a fluorescent label on the phosphate chain of the nucleotide rather than on the base. Incorporated nucleotides are detected in real time, based on the associated fluorophore released upon cleavage of the phosphate chain, to infer the DNA sequence in each ZMW. (E) Nanopore sequencing is a direct real-time, single-molecule, long-read sequencing method. Nanopore flow cells contain an array of transmembrane nanopores (green) embedded in an electro-resistant membrane (blue). Each nanopore connects to an electrode, which measures the electric current that flows through the nanopore. When a nucleic acid molecule is guided through a nanopore by a helicase (navy blue), the current is disrupted resulting in a characteristic ‘squiggle’. The nucleic acid sequence can then be inferred from the squiggle in real time, using basecalling algorithms based on neural networks.
Figure 2.
Figure 2.. The number of NCBI Virus nucleotide records (y-axis) released over time (x-axis) from 1982 to 2023 [6].
Total deposited viral sequences are plotted in blue, and all SARS-CoV-2 nucleotide sequences deposited on NCBI from 2020 to 2023 are plotted in red.
Figure 3.
Figure 3.. High throughput approaches to map proviral integration sites in the host genome.
(A) 3′ junction amplification approaches were first used to map the junction of proviral integration sites of retroviruses like HTLV-1 and HIV in the human genome. DNA extracted from infected cells is subject to fragmentation by restriction enzymes or sonication, and then ligation to DNA linkers (purple). Integration sites can be amplified using one primer that binds to the 3′ viral LTR promoter and another that binds the linker. PCR products can then be prepared for sequencing by capillary electrophoresis or NGS. (B) DNA probe capture can enrich proviral integrants for NGS. A set of biotinylated DNA probes (green circle) is designed to tile the proviral genome. Probes that bind to the 5′ or 3′ end of the proviral genome, will often enrich for the junction of the integration site within the host genome. Infected genomic DNA is prepared for NGS using standard library preparation procedures. The libraries are then mixed with proviral-specific biotinylated probes for hybridisation. Streptavidin-coated magnetic beads are used to isolate the proviral DNA fragments and integration site junctions, which can then be subject to NGS. (C) Multiple-displacement amplification single genome sequencing (MDA-SGS) allows resolution of near full-length proviral sequences as well as mapping the integration site junction. DNA extracted from infected cells is diluted to a proviral endpoint, so that individual proviruses and integration sites can be independently amplified. MDA is catalysed by phi29 DNA polymerase, and from the MDA reaction, near full length (NFL) proviral genomes can be amplified by nested PCR and subject to capillary electrophoresis or long-read sequencing. Insertion sites can be amplified by 3′ junction amplification, and sequenced by NGS. (D) PCIP-seq leverages selective cleavage of circularised DNA fragments carrying proviral DNA with a pool of CRISPR guide RNAs, followed by inverse long-range PCR and long-read sequencing. Genomic DNA isolated from infected cells is sheared to approximately the length of the proviral genome. Intramolecular ligation is performed to create circular DNA, and remaining linear DNA is digested by nucleases. The circular DNA containing proviral sequences is selectively linearised by targeting regions adjacent to the 5′ and 3′ LTRs (black arrows) for CRISPR-mediated cleavage (orange). Inverse long-range PCR is performed to amplify the proviral integration site and proviral genome, followed by long-read sequencing.

Similar articles

Cited by

References

    1. Jou, W.M., Haegeman, G., Ysebaert, M. and Fiers, W. (1972) Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein. Nature 237, 82–88 10.1038/237082a0 - DOI - PubMed
    1. Sanger, F. and Coulson, A.R. (1975) A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94, 441–448 10.1016/0022-2836(75)90213-2 - DOI - PubMed
    1. Sanger, F., Air, G.M., Barrell, B.G., Brownt, N.L., Coulson, A.R., Fiddes, C.et al. (1977) Nucleotide sequence of bacteriophage φX174 DNA. Nature 265, 687–695 10.1038/265687a0 - DOI - PubMed
    1. Sanger, F., Coulson, A.R., Hong, G.F., Hill, D.F. and Petersen, G.B. (1982) Nucleotide sequence of bacteriophage λ DNA. J. Mol. Biol. 162, 729–773 10.1016/0022-2836(82)90546-0 - DOI - PubMed
    1. NCBI Resource Coordinators. (2018) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 46, D8–D13 10.1093/nar/gkx1095 - DOI - PMC - PubMed

Publication types