Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 13;16(738):eadj9283.
doi: 10.1126/scitranslmed.adj9283. Epub 2024 Mar 13.

Genome-wide repeat landscapes in cancer and cell-free DNA

Affiliations

Genome-wide repeat landscapes in cancer and cell-free DNA

Akshaya V Annapragada et al. Sci Transl Med. .

Abstract

Genetic changes in repetitive sequences are a hallmark of cancer and other diseases, but characterizing these has been challenging using standard sequencing approaches. We developed a de novo kmer finding approach, called ARTEMIS (Analysis of RepeaT EleMents in dISease), to identify repeat elements from whole-genome sequencing. Using this method, we analyzed 1.2 billion kmers in 2837 tissue and plasma samples from 1975 patients, including those with lung, breast, colorectal, ovarian, liver, gastric, head and neck, bladder, cervical, thyroid, or prostate cancer. We identified tumor-specific changes in these patients in 1280 repeat element types from the LINE, SINE, LTR, transposable element, and human satellite families. These included changes to known repeats and 820 elements that were not previously known to be altered in human cancer. Repeat elements were enriched in regions of driver genes, and their representation was altered by structural changes and epigenetic states. Machine learning analyses of genome-wide repeat landscapes and fragmentation profiles in cfDNA detected patients with early-stage lung or liver cancer in cross-validated and externally validated cohorts. In addition, these repeat landscapes could be used to noninvasively identify the tissue of origin of tumors. These analyses reveal widespread changes in repeat landscapes of human cancers and provide an approach for their detection and characterization that could benefit early detection and disease monitoring of patients with cancer.

PubMed Disclaimer

Conflict of interest statement

Competing interests: A.V.A., R.B.S., and V.E.V. are inventors on patent applications submitted by Johns Hopkins University related to genome-wide repeat landscapes in cancer and cfDNA (US Patent application number 63/532,642). A.V.A., D.C.B., V.A., D.M., Z.H.F., J.P., and R.B.S. are inventors on patent applications submitted by Johns Hopkins University related to cell-free DNA for cancer detection that have been licensed to Delfi Diagnostics. J.R.W. is the founder and owner of Resphera Biosciences LLC and serves as a consultant to Personal Genome Diagnostics Inc. and Delfi Diagnostics Inc. C.C. is the founder and owner of CMCC Consulting. J.P., V.A., and R.B.S. are founders of Delfi Diagnostics, and V.A. and R.B.S are consultants for this organization. V.E.V. is a founder of Delfi Diagnostics, serves on the board of directors and as an officer for this organization, and owns Delfi Diagnostics stock, which is subject to certain restrictions under university policy. In addition, Johns Hopkins University owns equity in Delfi Diagnostics. V.E.V. divested his equity in Personal Genome Diagnostics (PGDx) to LabCorp in February 2022. V.E.V. is an inventor on patent applications submitted by Johns Hopkins University related to cancer genomic analyses and cell-free DNA for cancer detection that have been licensed to one or more entities, including Delfi Diagnostics, LabCorp, QIAGEN, Sysmex, Agios, Genzyme, Esoterix, Ventana, and ManaT Bio. Under the terms of these license agreements, the university and inventors are entitled to fees and royalty distributions. V.E.V. is an advisor to Viron Therapeutics and Epitope. These arrangements have been reviewed and approved by the Johns Hopkins University in accordance with its conflict-of-interest policies. The remaining authors declare that they have no competing interests.

Figures

Fig. 1.
Fig. 1.. Overview of ARTEMIS method.
De novo identification of kmers revealed ~1.2 billion unique kmers spanning 1280 distinct repeat elements. These elements represent six families: transposable elements, SINEs, satellites, LTRs, LINEs, and RNA elements. In an individual sample, the kmer repeat landscape is defined as the sum of the counts of all kmers comprising each repeat type identified in all sequence reads, normalized by coverage. These landscapes are used in machine learning to generate an ARTEMIS score for disease characterization and prediction.
Fig. 2.
Fig. 2.. Kmer repeat landscapes across human cancers reveal widespread differences from normal tissues.
(A) The heatmap shows the ratio of kmer repeat landscapes for each PCAWG tumor as compared with its matched normal, revealing high numbers of tumor-specific changes that can be correlated to genomic instability metrics (n = 469 tumor/normal pairs representing all PCAWG samples with genomic instability metrics available). Each PCAWG tumor is listed along the y axis, and each individual repeat element type is along the x axis. Ratios greater than one (red) indicate an increase in the element in the tumor, whereas ratios less than one (blue) indicate a decrease in the tumor. Most of these identified changes are in elements (820 of 1280) with no prior evidence for changes in cancer, as shown in yellow in the evidence bar along the x axis. (B) The plot shows elements from all six repeat element families ordered by the Benjamini-Hochberg–corrected P value of the Wilcoxon signed-rank test comparing the overlap of repeat elements with tumor-specific structural breakpoints versus their overlap with randomly selected genomic regions. Filled circles indicate elements newly implicated in cancer through this study, whereas open circles indicate elements with prior evidence for involvement in cancer. Red circles indicate elements depleted of breakpoints, and yellow circles indicate elements enriched for breakpoints. TEs, transposable elements. (C) Box plots show the distribution of tumor:normal kmer count ratios for repeat element types overlapping the ERBB2 region (1 Mb) in PCAWG breast cancers (n = 91 tumor/normal pairs). Ratios for each patient for all elements with >0.5% of kmers found in the region are shown (left), and the Benjamini-Hochberg–corrected P value from the Wilcoxon signed-rank test is plotted for each comparison, with points in red indicating P < 0.05 (right). Element names in bold indicate those newly implicated in cancer through this study. (D) Box plots show the ratios of tumor:normal kmer counts for kmers occurring within LINE-1–mediated deletions in PCAWG lung tumors containing at least one LINE-1–mediated deletion (n = 5; data file S1). (E) Kaplan-Meir plots of overall survival and progression-free survival of PCAWG tumors of AJCC (American Joint Committee on Cancer) stage III or IV (n = 167) stratified into two groups based on predicted ARTEMIS scores. The group shown in blue had ARTEMIS scores below the median value, and the group shown in red had ARTEMIS scores above the median value.
Fig. 3.
Fig. 3.. Kmer repeat landscapes capture tumor-specific changes in the plasma.
(A) Top: Each bar plot shows for a given human satellite 2 or 3 element type the percentage of its kmer occurrences found on chrY (dark blue) and on all other chromosomes (light blue) in the chm13 reference. Bottom: In individuals without cancer (n = 158), the distribution of coverage-normalized kmer counts in cfDNA for these satellite types in males (n = 87) and females (n = 71). P values for the Wilcoxon signed-rank test are shown at the top of each plot. (B) Kmer counts for PCAWG tissue (top; n = 54 liver, n = 48 lung squamous, and n = 38 lung adenocarcinoma tumor/normal pairs) and plasma cfDNA (bottom; n = 75 patients with liver cancer and n = 133 patients without cancer; n = 29 patients with lung squamous cell cancer and n = 158 patients without cancer; n = 62 patients with lung adenocarcinoma and n = 158 patients without cancer). The top five features with significant differences in both tissue and plasma, and at least 1000 expected kmers per million aligned reads are shown for each cancer type as separate plots. P values are shown at the top of each plot and were calculated by the Wilcoxon signed-rank test.
Fig. 4.
Fig. 4.. Impact of epigenetic state on repeat element representation in cfDNA.
(A) A summary of peaks per megabase of each chromatin state for each histone type is indicated at the top of each plot. The peak density is scaled within each chromatin immunoprecipitation sequencing experiment to account for different numbers of peaks in each experiment. PC, polycomb. (B) Box plots show the proportion of histone peaks of each type (columns) in each of 1280 repeat elements organized into six families. (C) In plasma from patients without cancer (n = 158) in the LUCAS cohort, the distributions of aligned fragment sizes for fragments overlapping each histone mark and all fragments are plotted. The line is the median, and the shading indicates ±1 SD, plotted as a difference in distributions. (D) In plasma from patients without cancer (n = 158) in the LUCAS cohort, plots of coverage genome-wide versus within regions of each histone mark are shown. The x axis represents the log average coverage, and the y axis represents the log difference in count. (E) In plasma from patients without cancer (n = 158) in the LUCAS cohort, box plots show the ratios of average observed to expected kmer counts for the features in the top and bottom deciles of histone mark density. P values for the Wilcoxon signed-rank test are shown above each plot.
Fig. 5.
Fig. 5.. ARTEMIS and ARTEMIS-DELFI for detection of lung cancer using cfDNA.
(A) Distributions of ARTEMIS and joint ARTEMIS-DELFI scores for patients with (n = 129) and without (n = 158) cancer in the cross-validated LUCAS cohort separated by biopsy status (individuals without cancer), cancer stage, and histology. SCLC, small cell lung cancer. (B) ROC analyses of ARTEMIS and ARTEMIS-DELFI scores classifying individuals with and without lung cancer in the full LUCAS cohort and in subgroups by cancer stage. (C) The sensitivity and specificity achieved by ARTEMIS and ARTEMIS-DELFI in the external validation cohort at locked score thresholds that achieved 50 to 80% specificity in the cross-validated cohort.

Similar articles

Cited by

References

    1. Vollger MR, Guitart X, Dishuck PC, Mercuri L, Harvey WT, Gershman A, Diekhans M, Sulovari A, Munson KM, Lewis AP, Hoekzema K, Porubsky D, Li R, Nurk S, Koren S, Miga KH, Phillippy AM, Timp W, Ventura M, Eichler EE, Segmental duplications and their variation in a complete human genome. Science 376, eabj6965 (2022). - PMC - PubMed
    1. Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, Avdeyev P, Taylor DJ, Shafin K, Shumate A, Xiao C, Wagner J, McDaniel J, Olson ND, Sauria MEG, Vollger MR, Rhie A, Meredith M, Martin S, Lee J, Koren S, Rosenfeld JA, Paten B, Layer R, Chin C-S, Sedlazeck FJ, Hansen NF, Miller DE, Phillippy AM, Miga KH, McCoy RC, Dennis MY, Zook JM, Schatz MC, A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533 (2022). - PMC - PubMed
    1. Hoyt SJ, Storer JM, Hartley GA, Grady PGS, Gershman A, de Lima LG, Limouse C, Halabian R, Wojenski L, Rodriguez M, Altemose N, Rhie A, Core LJ, Gerton JL, Makalowski W, Olson D, Rosen J, Smit AFA, Straight AF, Vollger MR, Wheeler TJ, Schatz MC, Eichler EE, Phillippy AM, Timp W, Miga KH, O’Neill RJ, From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science 376, eabk3112 (2022). - PMC - PubMed
    1. Gershman A, Sauria MEG, Guitart X, Vollger MR, Hook PW, Hoyt SJ, Jain M, Shumate A, Razaghi R, Koren S, Altemose N, Caldas GV, Logsdon GA, Rhie A, Eichler EE, Schatz MC, O’Neill RJ, Phillippy AM, Miga KH, Timp W, Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022). - PMC - PubMed
    1. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen N-C, Cheng H, Chin C-S, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PGS, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM, The complete sequence of a human genome. Science 376, 44–53 (2022). - PMC - PubMed

Substances