Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 May;19(5):269-285.
doi: 10.1038/nrg.2017.117. Epub 2018 Mar 26.

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations

Affiliations
Review

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations

Jesse J Salk et al. Nat Rev Genet. 2018 May.

Abstract

Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The signal-to-noise problem
The accuracy of all analytical measurements, DNA sequencing included, depends on the ratio between true value and the precision of the detection method. This is analogous to the noisiness of a digital camera image: at a low signal-to-noise ratio, an image is indecipherable (a), but with increasing sensor quality (b–d) the image becomes progressively recognizable as a face and then a specific individual.
Figure 2
Figure 2. Methods of consensus-based error correction on short-read platforms
a| Safe Sequencing System (SafeSeqS) uses randomly generated molecular barcodes carried by PCR primers (coloured thick bars) to reduce errors by independently labelling each single-stranded DNA molecule, thus allowing identification of derivative copies. True mutations (circles) can be discerned from sequencing errors or late PCR errors (crosses) because the latter occur only in a subset of identically labelled duplicate reads. PCR errors that occur during the first cycle of amplification (triangles), can be propagated to all duplicates and escape error correction. b | Single-Molecule Molecular Inversion Probes (smMIPs) entail two targeting arms joined by a linker that contains a molecular barcode. The molecules are hybridized with single-stranded DNA and then extended and ligated to form closed loops which are amplified and sequenced. Consensus-based error correction is similar to SafeSeqS, and similarly susceptible to first-cycle amplification artefacts. c | Circular Sequencing (CircSeq) entails circularization of single-stranded DNA fragments without any molecular barcodes, followed by rolling-circle amplification, fragmentation and sequencing of short stretches of concatemerized fragments. The molecular fragmentation points of the starting molecules serve as unique molecular identifiers (UMIs) for consensus-based error correction. As with other single-stranded consensus methods, recurrent amplification errors may fail to be identified and corrected. d | UMI-tailed adapters can be ligated to a library to uniquely mark each single strand. Despite both strands in a complex being tagged, no means is provided to relate the consensus of one strand to that of its mate for comparison and early PCR errors (triangles) may go unrecognized. e | CypherSeq circularizes double-stranded DNA molecules using a single adapter molecule containing double-stranded molecular barcodes. Targeted enrichment is achieved with rolling-circle amplification using primers directed to each DNA strand. Although information from both strands may be contribute to consensus making, lack of asymmetry between the two strands makes it impossible to discern whether one or both strands successfully amplified. Recurrent early amplification errors (triangles) can escape error correction when only one strand worth of data is successfully recovered because this cannot be recognized. f | Duplex Sequencing (DupSeq) allows true duplex error correction on high-throughput short-read sequencing platforms by applying molecular barcodes to each double-stranded DNA molecule in such a way that amplification products of the two strands can be informatically related to each other (thick colored bars), but also distinguished (blue versus green strands). After tagging, derivative PCR products are grouped by molecular barcode and by strand. Consensuses are made for each strand group and then compared to that of the complementary strand. True mutations (circles) can be confidently distinguished from both sequencing errors and late PCR errors (crosses) as well as first-round PCR errors (triangles), because complementary errors are extremely unlikely to occur by chance at the same position on both DNA strands. See the main text for a detailed description of each method.
Figure 3
Figure 3. Methods of single-molecule sequencing consensus-based error correction
a| The INC-Seq method begins by circularizing double-stranded DNA fragments followed by rolling-circle amplification of the loop; each product is a long DNA strand comprising concatenated copies of one of the strands of the starting molecule. This is sequenced on a long-read platform. b | 2D nanopore sequencing involves ligation of a hairpin adapter to one end of a duplex DNA molecule followed by tandem nanopore sequencing of the linked original strands. c | SMRTbell sequencing entails ligation of hairpin adapters to each end of a molecule, followed by direct sequencing of the closed loop on the long-read Pacific Biosciences (PacBio) platform. Both strands are sequenced together in multiple passes. In all cases, consensus sequences incorporate data from both DNA strands.
Figure 4
Figure 4. Impact of error correction technology on detection sensitivity
The positive predicted value (the expected number of correct positive calls divided by the total number of positive calls) is plotted as a function of the variant allele frequency in a molecular population for each sequencing method of a specified error rate. As seen by curve overlap, nearly all mutant calls will be correct using any method if the frequency of detected variants is greater than 1/10. However, the error rates of standard Illumina Sequencing and single-stranded tag-based error correction result in critical losses in positive-predictive value at variant frequencies of ~ 1/100 and 1/1000 respectively. The extremely low error rate conferred by Duplex Sequencing enables confident identification of variants below 1/100,000 (dotted line).
Figure 5
Figure 5. Applications of rare variant detection
a| Cancer. Genetic heterogeneity within tumours is thought to be responsible for the emergence of therapeutic resistance. In lung adenocarcinomas with certain epidermal growth factor receptor (EGFR) mutations, under treatment with targeted inhibitors, drug resistance mutations arise at low levels then clonally expand. b | Cell-free tumour DNA. Tumour cells release fragments of DNA into plasma and other body fluids that can be sampled via ‘liquid biopsy’. This serves as a non-invasive means of determining the genetic makeup of a tumour without a physical biopsy and is a sensitive way to detect minimal residual disease and early relapse. c | Circulating fetal DNA. Placental-derived DNA in the maternal circulation can be used to non-invasively detect fetal genetic traits or abnormalities. d | Fetal microchimerism. Fetal cells that engraft into a mother may persist many years after birth. These have important immunological consequences. e | Immunological mosaicism. Somatic V(D)J recombination and hypermutation in B and T cells create heterogeneity that helps the body adapt defences to new infectious and neoplastic threats. f | Antimicrobial resistance. Low-frequency variants in single-cell populations can be responsible for drug-resistance outbreaks g | Metagenomics. Complex mixtures of microorganisms exist throughout the living world. The human body is colonized with symbiotic microbes and in some diseases, health problems can arise from disrupted microbial diversity. h | Forensics. Mixtures of human tissues are routinely recovered at crime scenes or natural disasters. In some scenarios the abundance of one individual’s DNA may be much greater than the other. i | Mutational exposure. DNA damage can be caused by normal ageing as well as carcinogens. Very-low-frequency mutation load may be proportional to future cancer risk. j | Ageing. DNA damage occurs throughout life from exogenous and endogenous processes. Low-frequency mutations in both the nuclear and mitochondrial genome (the latter is shown here) may play a role in certain age-related pathologies besides cancer, such as neurodegeneration and autoimmunity. Subclonal mutations might serve as a biomarker of disease risk or even longevity.

Similar articles

Cited by

References

    1. Darwin C. On the origin of species. John Murray Press; 1859.
    1. Luria SE, Delbrück M. Mutations of Bacteria from Virus Sensitivity to Virus Resistance. Genetics. 1943;28:491–511. - PMC - PubMed
    1. Cairns J. Mutation selection and the natural history of cancer. Nature. 1975;255:197–200. - PubMed
    1. Fisher R, et al. Deep sequencing reveals minor protease resistance mutations in patients failing a protease inhibitor regimen. J Virol. 2012;86:6231–6237. - PMC - PubMed
    1. Schmitt MW, Loeb LA, Salk JJ. The influence of subclonal resistance mutations on targeted cancer therapy. Nat Rev Clin Oncol. 2016;13:335–347. - PMC - PubMed

Publication types