Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).
Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).
Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011). This study provides one of the first frameworks for using an ensemble approach to detect structural variants as part of phase 1 for the 1KGP.
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). This paper describes the development of the 1KGP phase 3 release set, which is currently one of the largest and most diverse reference sets.
Sudmant, P. H. et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015).
Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J. O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).
Spielmann, M., Lupiáñez, D. G. & Mundlos, S. Structural variation in the 3D genome. Nat. Rev. Genet. 19, 453–467 (2018).
Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Lappalainen, T., Scott, A. J., Brandt, M. & Hall, I. M. Genomic analysis in the age of human genome sequencing. Cell 177, 70–84 (2019).
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).
Sharp, A. J. et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005).
Hastings, P. J., Lupski, J. R., Rosenberg, S. M. & Ira, G. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).
Sherry, S. T. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
Exome Aggregation Consortium et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Macintyre, G., Ylstra, B. & Brenton, J. D. Sequencing structural variants in cancer for precision therapeutics. Trends Genet. 32, 530–542 (2016).
Yi, K. & Ju, Y. S. Patterns and mechanisms of structural variations in human cancer. Exp. Mol. Med. 50, 98 (2018).
Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).
Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009).
Hajirasouliha, I. et al. Detection and characterization of novel sequence insertions using paired-end next-generation sequencing. Bioinformatics 26, 1277–1283 (2010).
Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Korbel, J. O. et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, R23 (2009).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. 43, 269–276 (2011).
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Sindi, S. S., Önal, S., Peng, L. C., Wu, H.-T. & Raphael, B. J. An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol. 13, R22 (2012).
Zhao, X., Emery, S. B., Myers, B., Kidd, J. M. & Mills, R. E. Resolving complex structural genomic rearrangements using a randomized approach. Genome Biol. 17, 126 (2016).
Michaelson, J. J. & Sebat, J. forestSV: structural variant discovery through statistical learning. Nat. Methods 9, 819–821 (2012).
Telenti, A. et al. Deep sequencing of 10,000 human genomes. Proc. Natl Acad. Sci. USA 113, 11901–11906 (2016).
Kosugi, S. et al. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 20, 117 (2019). This paper extensively compares the sensitivity of SV detection algorithms and the combinations of these algorithms.
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019). This study generates one of the most comprehensive multiplatform haplotype-specific SV discovery sets and provides potential frameworks for their integration.
Wong, K., Keane, T. M., Stalker, J. & Adams, D. J. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biol. 11, R128 (2010).
Lam, H. Y. K. et al. Detecting and annotating genetic variations using the HugeSeq pipeline. Nat. Biotechnol. 30, 226–229 (2012).
Parikh, H. et al. svclassify: a method to establish benchmark structural variant calls. BMC Genom. 17, 64 (2016).
Collins, R. L. et al. An open resource of structural variation for medical and population genetics. bioRxiv https://doi.org/10.1101/578674 (2019).
Abel, H. J. et al. Mapping and characterization of structural variation in 17,795 deeply sequenced human genomes. bioRxiv https://doi.org/10.1101/508515 (2018).
Hehir-Kwa, J. Y. et al. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nat. Commun. 7, 12989 (2016).
Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).
Chiang, C. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015).
Larson, D. E. et al. svtools: population-scale analysis of structural variation. Bioinformatics https://doi.org/10.1093/bioinformatics/btz492 (2019).
Mimori, T. et al. iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data. BMC Syst. Biol. 7, S8 (2013).
Zarate, S. et al. Parliament2: fast structural variant calling using optimized combinations of callers. bioRxiv https://doi.org/10.1101/424267 (2018).
Mohiyuddin, M. et al. MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics 31, 2741–2744 (2015).
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Becker, T. et al. FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods. Genome Biol. 19, 38 (2018).
Pounraja, V. K., Jayakar, G., Jensen, M., Kelkar, N. & Girirajan, S. A machine-learning approach for accurate detection of copy number variants from exome sequencing. Genome Res. 29, 1134–1143 (2019).
Huddleston, J. & Eichler, E. E. An incomplete understanding of human genetic variation. Genetics 202, 1251–1254 (2016).
Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Kloosterman, W. P. et al. Characteristics of de novo structural changes in the human genome. Genome Res. 25, 792–801 (2015).
Nagasaki, M. et al. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat. Commun. 6, 8018 (2015).
Morales, J. et al. A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol. 19, 21 (2018).
Zook, J. M. et al. A robust benchmark for germline structural variant detection. bioRxiv https://doi.org/10.1101/664623 (2019). This study integrates multiple platforms to develop a gold standard reference set for SV benchmarking.
Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015). This is one of the first papers using PacBio for comprehensive SV discovery, detecting thousands of previously undetectable SVs, including small insertions in tandem repeats and mobile elements.
Treangen, T. J. & Salzberg, S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2012).
Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods 6, S13–S20 (2009).
Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).
McCoy, R. C. et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLOS ONE 9, 13 (2014).
Zheng, G. X. Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016). This paper is the first major study using linked reads to detect SVs in human genomes and demonstrates the ability of linked reads in phasing large haplotype blocks and detecting gene fusions.
Bishara, A. et al. Read clouds uncover variation in complex regions of the human genome. Genome Res. 25, 1570–1580 (2015).
Marks, P. et al. Resolving the full spectrum of human genome variation using linked-reads. Genome Res. 29, 635–645 (2019).
Spies, N. et al. Genome-wide reconstruction of complex structural variants using read clouds. Nat. Methods 14, 915–920 (2017).
Fang, L. et al. LinkedSV: detection of mosaic structural variants from linked-read exome and genome sequencing data. bioRxiv https://doi.org/10.1101/409789 (2019).
Elyanow, R., Wu, H.-T. & Raphael, B. J. Identifying structural variants using linked-read sequencing data. Bioinformatics 34, 353–360 (2018).
Eslami Rasekh, M. et al. Discovery of large genomic inversions using long range information. BMC Genom. 18, 65 (2017).
Karaoglanoglu, F. et al. Characterization of segmental duplications and large inversions using linked-reads. bioRxiv https://doi.org/10.1101/394528 (2018).
Xia, L. C. et al. Identification of large rearrangements in cancer genomes with barcode linked reads. Nucleic Acids Res. 46, e19 (2018).
Wong, K. H. Y., Levy-Sakin, M. & Kwok, P.-Y. De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations. Nat. Commun. 9, 3040 (2018).
Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct determination of diploid genome sequences. Genome Res. 27, 757–767 (2017).
Meleshko, D., Marks, P., Williams, S. & Hajirasouliha, I. Detection and assembly of novel sequence insertions using linked-read technology. bioRxiv https://doi.org/10.1101/551028 (2019).
Sedlazeck, F. J., Lee, H., Darby, C. A. & Schatz, M. C. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 19, 329–346 (2018). This Review discusses the main bioinformatics challenges faced by many of the described technologies. Topics include phasing, assembly, long-range expression and methylation.
Shajii, A., Numanagić, I., Whelan, C. & Berger, B. Statistical binning for barcoded reads improves downstream analyses. Cell Syst. 7, 219–226.e5 (2018).
Falconer, E. et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods 9, 1107–1112 (2012). This is the first major study showing the utility of Strand-seq for the detection of chromosomal rearrangements, along with the first application of this method in human genomes.
Sanders, A. D. et al. Characterizing polymorphic inversions in human genomes by single-cell sequencing. Genome Res. 26, 1575–1587 (2016). This paper is the first major work using Strand-seq to detect inversions and reveals numerous inverted loci of interest within the human genome.
Hills, M., O’Neill, K., Falconer, E., Brinkman, R. & Lansdorp, P. M. BAIT: organizing genomes and mapping rearrangements in single cells. Genome Med. 5, 82 (2013).
Sanders, A. D., Falconer, E., Hills, M., Spierings, D. C. J. & Lansdorp, P. M. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat. Protoc. 12, 1151–1176 (2017).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Harewood, L. et al. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours. Genome Biol. 18, 125 (2017). This is the first study detecting both large chromosomal rearrangements and copy number changes with Hi-C.
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Steininger, A. et al. Genome-wide analysis of interchromosomal interaction probabilities reveals chained translocations and overrepresentation of translocation breakpoints in genes in a cutaneous T-cell lymphoma cell line. Front. Oncol. 8, 183 (2018).
Seaman, L. et al. Nucleome analysis reveals structure–function relationships for colon cancer. Mol. Cancer Res. 15, 821–830 (2017).
Chakraborty, A. & Ay, F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics 34, 338–345 (2018).
Zhang, X. et al. Local and global chromatin interactions are altered by large genomic deletions associated with human brain development. Nat. Commun. 9, 5356 (2018).
Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018). This study integrates three platforms, showing that their combination is necessary to detect the range of SVs in cancer genomes, and describes the only algorithm that currently detects most SV types with Hi-C.
Díaz, N. et al. Chromatin conformation analysis of primary patient tissue using a low input Hi-C method. Nat. Commun. 9, 4938 (2018).
Lee, H. & Schatz, M. C. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics 28, 2097–2105 (2012).
Stephens, Z., Wang, C., Iyer, R. K. & Kocher, J.-P. Detection and visualization of complex structural variants from long reads. BMC Bioinform. 19, 508 (2018).
English, A. C., Salerno, W. J. & Reid, J. G. PBHoney: identifying genomic variants via long-read discordance and interrupted mapping. BMC Bioinform. 15, 180 (2014).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).
Heller, D. & Vingron, M. SVIM: structural variant identification using mapped long reads. Bioinformatics 35, 2907–2915 (2019).
Fang, L., Hu, J., Wang, D. & Wang, K. NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data. BMC Bioinform. 19, 180 (2018).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
Rhoads, A. & Au, K. F. PacBio sequencing and its applications. Genom. Proteom. Bioinform. 13, 278–289 (2015).
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Shi, L. et al. Long-read sequencing and de novo assembly of a Chinese genome. Nat. Commun. 7, 12065 (2016).
Seo, J.-S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).
Ameur, A. et al. De novo assembly of two Swedish genomes reveals missing segments from the human GRCh38 reference and improves variant calling of population-scale sequencing data. Genes 9, 486 (2018).
Kronenberg, Z. N. et al. High-resolution comparative analysis of great ape genomes. Science 360, eaar6343 (2018).
Nagasaki, M. Construction of JRG (Japanese reference genome) with single-molecule real-time sequencing. Hum. Genome Var. 6, 27 (2019).
Audano, P. A. et al. Characterizing the major structural variant alleles of the human genome. Cell 176, 663–675.e19 (2019). This study is the most comprehensive PacBio-based SV discovery project to date, detecting variants over 15 deeply sequenced individuals and creating a call-set reference with major shared SVs.
Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 4, 265–270 (2009).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Cretu Stancu, M. et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat. Commun. 8, 1326 (2017). This is the first major paper using nanopore sequencing to detect SVs in human genomes and describes the NanoSV algorithm.
Gong, L. et al. Picky comprehensively detects high-resolution structural variants in nanopore long reads. Nat. Methods 15, 455–460 (2018).
De Coster, W. et al. Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome. Genome Res. 29, 1178–1187 (2019).
Lam, E. T. et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol. 30, 771–776 (2012). This is the first major study using Bionano optical mapping to detect SVs in human genomes, leveraging the long molecules to characterize the highly polymorphic major histocompatibility complex.
Schwartz, D. et al. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262, 110–114 (1993).
Teague, B. et al. High-resolution human genome structure by single-molecule analysis. Proc. Natl Acad. Sci. USA 107, 10848–10853 (2010).
Cao, H. et al. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. GigaScience 3, 34 (2014).
Mak, A. C. Y. et al. Genome-wide structural variation detection by genome mapping on nanochannel arrays. Genetics 202, 351–362 (2016).
Levy-Sakin, M. et al. Genome maps across 26 human populations reveal population-specific patterns of structural variation. Nat. Commun. 10, 1025 (2019).
Li, L. et al. OMSV enables accurate and comprehensive identification of large structural variations from nanochannel-based single-molecule optical maps. Genome Biol. 18, 230 (2017).
Hastie, A. R. et al. Rapid automated large structural variation detection in a diploid genome by nanochannel based next-generation mapping. bioRxiv https://doi.org/10.1101/102764 (2017).
Lima, L. et al. Comparative assessment of long-read error correction software applied to nanopore RNA-sequencing data. Brief. Bioinform. https://doi.org/10.1093/bib/bbz058 (2019).
Fu, S., Wang, A. & Au, K. F. A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol. 20, 26 (2019).
Zhang, H., Jain, C. & Aluru, S. A comprehensive evaluation of long read error correction methods. bioRxiv https://doi.org/10.1101/519330 (2019)
Jaratlerdsiri, W. et al. Next generation mapping reveals novel large genomic rearrangements in prostate cancer. Oncotarget 8, 23588–23602 (2017).
Xu, J. et al. An integrated framework for genome analysis reveals numerous previously unrecognizable structural variants in leukemia patients’ samples. bioRxiv https://doi.org/10.1101/563270 (2019).
Zhou, B. et al. Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562. Genome Res. 29, 472–484 (2019).
Zhou, B. et al. Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2. Nucleic Acids Res. 47, 3846–3861 (2019).
Chan, E. K. F. et al. Optical mapping reveals a higher level of genomic architecture of chained fusions in cancer. Genome Res. 28, 726–738 (2018).
English, A. C. et al. Assessing structural variation in a personal genome—towards a human reference diploid genome. BMC Genom. 16, 286 (2015). This study is one of the first applications of hybrid assembly for structural variant detection, showing highly increased sensitivity from platform integration.
Ritz, A. et al. Characterization of structural variants with single molecule and hybrid sequencing approaches. Bioinformatics 30, 3458–3466 (2014).
Fan, X., Chaisson, M., Nakhleh, L. & Chen, K. HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies. Genome Res. 27, 793–800 (2017).
Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2017).
McPherson, A. et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLOS Comput. Biol. 7, e1001138 (2011).
McPherson, A. et al. nFuse: discovery of complex genomic rearrangements in cancer using high-throughput sequencing. Genome Res. 22, 2250–2261 (2012).
Yorukoglu, D. et al. Dissect: detection and characterization of novel structural alterations in transcribed sequences. Bioinformatics 28, i179–i187 (2012).
Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).
Gheldof, N. et al. Structural variation-associated expression changes are paralleled by chromatin architecture modifications. PLOS ONE 8, e79973 (2013).
Fudenberg, G. & Pollard, K. S. Chromatin features constrain structural variation across evolutionary timescales. Proc. Natl Acad. Sci. USA 116, 2175–2180 (2019).
Quigley, D. A. et al. Genomic hallmarks and structural variation in metastatic prostate cancer. Cell 174, 758–769.e9 (2018).
Stranger, B. E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).
Chiang, C. et al. The impact of structural variation on human gene expression. Nat. Genet. 49, 692–699 (2017).
Merker, J. D. et al. Long-read genome sequencing identifies causal structural variation in a Mendelian disease. Genet. Med. 20, 159–163 (2018).
Miao, H. et al. Long-read sequencing identified a causal structural variant in an exome-negative case and enabled preimplantation genetic diagnosis. Hereditas 155, 32 (2018).
Roberts, D. S. et al. Linked-read sequencing analysis reveals tumor-specific genome variation landscapes in neurofibromatosis type 2 (NF2) patients. Otol. Neurotol. 40, e150–e159 (2019).
Sanchis-Juan, A. et al. Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing. Genome Med. 10, 95 (2018).
Cantsilieris, S. et al. Recurrent structural variation, clustered sites of selection, and disease risk for the complement factor H (CFH) gene family. Proc. Natl Acad. Sci. USA 115, E4433–E4442 (2018).
Nattestad, M. et al. Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line. Genome Res. 28, 1126–1135 (2018).
Aneichyk, T. et al. Dissecting the causal mechanism of X-linked dystonia–parkinsonism by integrating genome and transcriptome assembly. Cell 172, 897–909.e21 (2018).
Sharim, H. et al. Long-read single-molecule maps of the functional methylome. Genome Res. 29, 646–656 (2019).
Lee, I. et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. bioRxiv https://doi.org/10.1101/504993 (2019).
Beck, C. R. et al. Megabase length hypermutation accompanies human structural variation at 17p11.2. Cell 176, 1310–1324.e10 (2019).
Viswanathan, S. R. et al. Structural alterations driving castration-resistant prostate cancer revealed by linked-read genome sequencing. Cell 174, 433–447.e19 (2018). This study leverages layered biological information to understand the role of SVs in oncogene amplification for a specific cancer type.
Huynh, L. & Hormozdiari, F. TAD fusion score: discovery and ranking the contribution of deletions to genome structure. Genome Biol. 20, 60 (2019).
Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Sebat, J. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).
Kidd, J. M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Zhou, B. et al. Whole-genome sequencing analysis of CNV using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis. J. Med. Genet. 55, 735–743 (2018).
Speicher, M. R. & Carter, N. P. The new cytogenetics: blurring the boundaries with molecular biology. Nat. Rev. Genet. 6, 782–792 (2005).
Lee, C., Iafrate, A. J. & Brothman, A. R. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat. Genet. 39, S48–S54 (2007).
Scherer, S. W. et al. Challenges and standards in integrating surveys of structural variation. Nat. Genet. 39, S7–S15 (2007).
Tattini, L., D’Aurizio, R. & Magi, A. Detection of genomic structural variants from next-generation sequencing data. Front. Bioeng. Biotechnol. 3, 92 (2015).
Guan, P. & Sung, W.-K. Structural variation detection using next-generation sequencing data. Methods 102, 36–49 (2016).
Quinlan, A. R. & Hall, I. M. Characterizing complex structural variation in germline and somatic genomes. Trends Genet. 28, 43–53 (2012).
Tan, R. et al. An evaluation of copy number variation detection tools from whole-exome sequencing data. Hum. Mutat. 35, 899–907 (2014).
Hehir-Kwa, J. Y., Tops, B. B. J. & Kemmeren, P. The clinical implementation of copy number detection in the age of next-generation sequencing. Expert. Rev. Mol. Diagn. 18, 907–915 (2018).
Hehir-Kwa, J. Y., Pfundt, R. & Veltman, J. A. Exome sequencing and whole genome sequencing for the detection of copy number variation. Expert. Rev. Mol. Diagn. 15, 1023–1032 (2015).
Pang, A. W. et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010).
Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nat. Genet. 42, 400–405 (2010).
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).
Anderson-Trocmé, L. et al. Legacy data confounds genomics studies. Mol. Biol. Evol. https://doi.org/10.1093/molbev/msz201 (2019).
Lappalainen, I. et al. dbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 41, D936–D941 (2012).
Demaerel, W. et al. The 22q11 low copy repeats are characterized by unprecedented size and structure variability. Genome Res. 29, 1389–1401 (2019).
Carvalho, C. M. B. & Lupski, J. R. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet. 17, 224–238 (2016).
Vollger, M. R. et al. Long-read sequence and assembly of segmental duplications. Nat. Methods 16, 88–94 (2019).
Jiang, T., Liu, B., Li, J. & Wang, Y. rMETL: sensitive mobile element insertion detection with long read realignment. Bioinformatics https://doi.org/10.1093/bioinformatics/btz106 (2019).
Meng, G. et al. TSD: a computational tool to study the complex structural variants using PacBio targeted sequencing data. G3 9, 1371–1376 (2019).
Frith, M. C. & Khan, S. A survey of localized sequence rearrangements in human DNA. Nucleic Acids Res. 46, 1661–1673 (2018).
Greer, S. U. & Ji, H. P. Structural variant analysis for linked-read sequencing data with gemtools. Bioinformatics https://doi.org/10.1093/bioinformatics/btz239 (2019).
Bakhtiari, M., Shleizer-Burko, S., Gymrek, M., Bansal, V. & Bafna, V. Targeted genotyping of variable number tandem repeats with adVNTR. Genome Res. 28, 1709–1719 (2018).
Ummat, A. & Bashir, A. Resolving complex tandem repeats with long reads. Bioinformatics 30, 3491–3498 (2014).
Liu, Q., Zhang, P., Wang, D., Gu, W. & Wang, K. Interrogating the “unsequenceable” genomic trinucleotide repeat disorders by long-read sequencing. Genome Med. 9, 65 (2017).
Shao, H. et al. npInv: accurate detection and genotyping of inversions using long read sub-alignment. BMC Bioinform. 19, 261 (2018).
Mitsuhashi, S. Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biol. 20, 58 (2019).
Mitelman, F., Johansson, B. & Mertens, F. The impact of translocations and gene fusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007).
Zhang, Q. et al. Clinical application of single-molecule optical mapping to a multigeneration FSHD1 pedigree. Mol. Genet. Genom. Med. 7, e565 (2019).
Norris, A. L., Workman, R. E., Fan, Y., Eshleman, J. R. & Timp, W. Nanopore sequencing detects structural variants in cancer. Cancer Biol. Ther. 17, 246–253 (2016).
Euskirchen, P. et al. Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing. Acta Neuropathol. 134, 691–703 (2017).
Jacobson, E. C. et al. Hi-C detects novel structural variants in HL-60 and HL-60/S4 cell lines. Genomics https://doi.org/10.1016/j.ygeno.2019.05.009 (2019).
Greer, S. U. et al. Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases. Genome Med. 9, 57 (2017).
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).
Marshall, C. R. et al. Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet. 82, 477–488 (2008).
Sullivan, P. F. & Geschwind, D. H. Defining the genetic, genomic, cellular, and diagnostic architectures of psychiatric disorders. Cell 177, 162–183 (2019).
Yuen, R. K. et al. Genome-wide characteristics of de novo mutations in autism. Npj Genomic Med. 1, 160271–1602710 (2016).
Brand, H. et al. Paired-duplication signatures mark cryptic inversions and other complex structural variation. Am. J. Hum. Genet. 97, 170–176 (2015).
Turner, T. N. et al. Genome sequencing of autism-affected families reveals disruption of putative noncoding regulatory DNA. Am. J. Hum. Genet. 98, 58–74 (2016).
Brandler, W. M. et al. Paternally inherited cis-regulatory structural variants are associated with autism. Science 360, 327–331 (2018).
Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722.e12 (2017).
Mizuguchi, T. et al. Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases. J. Hum. Genet. 64, 191–197 (2019).
Mizuguchi, T. et al. A 12-kb structural variation in progressive myoclonic epilepsy was newly identified by long-read whole-genome sequencing. J. Hum. Genet. 64, 359–368 (2019).
Barseghyan, H. et al. Next-generation mapping: a novel approach for detection of pathogenic structural variants with a potential utility in clinical diagnosis. Genome Med. 9, 90 (2017).
Collins, R. L. et al. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome. Genome Biol. 18, 36 (2017).
Eisfeldt, J. et al. Comprehensive structural variation genome map of individuals carrying complex chromosomal rearrangements. PLOS Genet. 15, e1007858 (2019).
Dutta, U. R. et al. Breakpoint mapping of a novel de novo translocation t(X;20)(q11.1;p13) by positional cloning and long read sequencing. Genomics 111, 1108–1114 (2019).
Sherman, R. M. et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat. Genet. 51, 30–35 (2019).
Zhou, B. et al. Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools. Sci. Data 5, 180261 (2018).
Levy, S. et al. The diploid genome sequence of an individual human. PLOS Biol. 5, e254 (2007).
Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. bioRxiv https://doi.org/10.1101/735928 (2019).
Wang, Y.-C. et al. High-coverage, long-read sequencing of Han Chinese trio reference samples. Sci. Data 6, 91 (2019).
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
Acknowledgements
The authors thank Y. Wang, W. Zhou, A. Weber and B. Zhou for their valuable comments and help with proofreading the manuscript. S.S.H. was supported through the Michigan Predoctoral Training in Genetics grant (T32 GM007544). A.E.U. acknowledges funding by the National Institutes of Health (NIH) and the Simons Foundation, and is a Tashia and John Morgridge Faculty Scholar of the Stanford Child Health Research Institute.
Author information
Authors and Affiliations
Contributions
S.S.H. and R.E.M researched the literature and wrote the article. All authors provided substantial contributions to discussions of the content, and reviewed and/or edited the manuscript before submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information
Nature Reviews Genetics thanks C. Alkan, F. Sedlazeck and M. Talkowski for their contribution to the peer review of this work.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Glossary
- Structural variations
-
(SVs). Operationally defined as sequence variants >50 bp in size. The most recognized forms of SV include deletions, duplications, inversions, insertions and translocations.
- Complex rearrangements
-
A structural variant that consists of multiple combinations of structural variant types nested or clustered with one another.
- Read signatures
-
Specific marks that result from reads that map discordantly to the reference genome.
- Short-read HTS
-
(Short-read high-throughput sequencing). Standard sequencing where libraries are fragmented to ~600–800 bp in length. Two ends are sequenced ~100–250 bp with an unsequenced insert size of ~100–600 bp.
- Flow cells
-
Glass slides containing fluidic channels for sequencing reactions to occur.
- Microfluidics
-
Devices that precisely manipulate and control small amounts of fluids.
- SV callers
-
An algorithm designed to detect structural variations (SVs). Each putative SV detected by a caller is an individual ‘call’. ‘Call’ derives from computer science, meaning to invoke a particular task; detected SVs are the result of each performed ‘task’.
- Sensitivity
-
The ability to detect known variants correctly. Low sensitivity implies low ability to detect bona fide variants.
- Reference data sets
-
High-resolution structural variation data sets typically derived from de novo genome assemblies, population-scale sequencing or projects employing multiple orthogonal detection methods. Reference sets are used to benchmark detection algorithms and determine the novelty and rarity of structural variation calls.
- Ensemble algorithm
-
A detection method that combines the resulting call sets from multiple independent algorithms.
- False-discovery rate
-
The expected number of calls that should be false but are marked as true within the final call set.
- Coordinate overlap
-
The number of base pairs that are identical between two different variant calls.
- Purifying selection
-
A process of natural selection where strongly deleterious alleles are selectively removed from a population.
- Phased SVs
-
(Phased structural variations). Variants that are assigned to a paternal haplotype, often computed using family trio or heterozygous single-nucleotide variant data.
- Receiver operating characteristic curves
-
Plots of the true positive rate against the false positive rate showing the relationship between sensitivity and specificity.
- Connected-molecule strategies
-
Genomic methods that connect shorter reads of a DNA molecule together to provide long-range information.
- Sequence coverage
-
The average number of times a given locus is covered by a sequence read.
- Physical coverage
-
The average number of times a given locus is covered by the cumulative length of the reads, including unsequenced inserts.
- Single-molecule strategies
-
Genomic methods that read the entirety of long strands of DNA.
- Specificity
-
The ability to detect the absence of variants correctly. Low specificity implies many false positives.
- Base-calling error
-
Errors in determining the respective nucleotide from raw signals during sequencing.
- Circular consensus sequencing
-
A single-molecule real-time (SMRT) sequencing method that improves accuracy through multiple passes of the template molecule.
- Hybrid assembly
-
A genome assembly that leverages sequencing data from multiple platforms to reconstruct the original sequence, using the orthogonal data to extend the contig lengths or to branch contigs to one another.
- N50
-
A number that denotes the minimum contig size for which 50% of the nucleotide sequence is contained within. A larger N50 implies a more contiguous assembly.
- Topologically associating domain
-
A spatial partition of the genome where segments within these domains are enriched for interactions with each other when compared with interactions with segments outside the domain.
- Allelic bias
-
Gene expression that is biased towards one allele over the other.
Rights and permissions
About this article
Cite this article
Ho, S.S., Urban, A.E. & Mills, R.E. Structural variation in the sequencing era. Nat Rev Genet 21, 171–189 (2020). https://doi.org/10.1038/s41576-019-0180-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41576-019-0180-9
This article is cited by
-
Taming large-scale genomic analyses via sparsified genomics
Nature Communications (2025)
-
Diversity and consequences of structural variation in the human genome
Nature Reviews Genetics (2025)
-
FindCSV: a long-read based method for detecting complex structural variations
BMC Bioinformatics (2024)
-
Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data
Genome Biology (2024)
-
Whole-genome sequencing of copy number variation analysis in Ethiopian cattle reveals adaptations to diverse environments
BMC Genomics (2024)