Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 31;13(1):6524.
doi: 10.1038/s41467-022-34211-x.

MIR retrotransposons link the epigenome and the transcriptome of coding genes in acute myeloid leukemia

Affiliations

MIR retrotransposons link the epigenome and the transcriptome of coding genes in acute myeloid leukemia

Aristeidis G Telonis et al. Nat Commun. .

Abstract

DNMT3A and IDH1/2 mutations combinatorically regulate the transcriptome and the epigenome in acute myeloid leukemia; yet the mechanisms of this interplay are unknown. Using a systems approach within topologically associating domains, we find that genes with significant expression-methylation correlations are enriched in signaling and metabolic pathways. The common denominator across these methylation-regulated genes is the density in MIR retrotransposons of their introns. Moreover, a discrete number of CpGs overlapping enhancers are responsible for regulating most of these genes. Established mouse models recapitulate the dependency of MIR-rich genes on the balanced expression of epigenetic modifiers, while projection of leukemic profiles onto normal hematopoiesis ones further consolidates the dependencies of methylation-regulated genes on MIRs. Collectively, MIR elements on genes and enhancers are susceptible to changes in DNA methylation activity and explain the cooperativity of proteins in this pathway in normal and malignant hematopoiesis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. DNA methylation as a regulator of gene expression in signaling and metabolic genes.
a Volcano plots showing the statistical significance of each mCpG-mRNA correlation vs. the correlation coefficient. b Examples of expression-methylation correlations. The graph at the top-right corner depicts the position of the mCpG with respect to the gene locus. The 5’-most as well as the closest transcriptional start sites downstream and upstream to the mCpG are noted. Box plots show the median (center), 25–75 percentile (box), and 5–95 percentile (whisker) from n = 16, 9 and 11 DNMT3A, IDH1/2 and double mutant samples, respectively. c Histograms of the distance between the mCpG and the gene for all pairs tested (left) and for the significant positive (middle) and negative (right) correlations in the Glass et al. cohort. Note the differences in the scales of the Y axes. Dashed lines indicate the distance cutoffs for classifying correlations. Asterisks indicate significant enrichments in the positive (n = 243; p value < 10−5) or negative (n = 15,904; p value < 10−5) proximal correlations (one-sided Chi-squared test). d Circus plot of the TAD containing the HOXB cluster illustrating the correlations of HOXB5 and of SNX11 with mCpGs in the Glass et al. cohort. The two tracks are symmetric; the top visualizes the genes and the bottom the mCpGs. The Venn diagram shows the number of mCpGs correlated with each gene. e Scatter plot of fold enrichment in the mCpGs commonly correlated with two genes against the respective FDR (one-sided Hypergeometric test). Each dot represents a gene pair, e.g., HOXB5 and SNX11. f Bar plot showing the enrichment of enhancers in the mCpGs with significant correlations (n = 10,959 for Glass et al. and n = 3549 for TCGA). Asterisks indicate p value < 10−4 (one-sided Hypergeometric test). g Pathways significantly enriched or depleted in the gene list ranked by correlation strength per GSEA. The full list of pathways is included in Supplementary Data 3. h Visualization of lipid metabolism genes with significant expression-methylation correlations. Two genes are connected if they use or produce the same metabolite. Genes are colored based on the distance bin with green prevailing purple. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Antithetical architecture of introns of genes with coupled vs. uncoupled expression-methylation.
a Description of the information presented on the plots. Specifically, plots show the difference between the cumulative distribution of the background genes (dashed gray line) and the cumulative distribution of the respective gene set. be Architectural parameters of the introns of the genes with significant expression-methylation correlations at proximal (n = 413 for Glass et al. and 347 for TCGA), intermediate (n = 1749 and 680) or long range (n = 1344 and 478), and of the 1000 genes with the weakest expression-methylation correlations, i.e., the W gene sets. Plots show the difference between the cumulative distribution of the background genes (dashed gray line) and the cumulative distribution of the respective gene set. Positive values mean a shift of the distribution toward higher values while negative values indicate a shift of the distribution toward lower values. Plots depict analyses of GC content (b), evolutionary conservation (c) and repetitive element densities of MIR (d) and Alu (e) elements, for the genes with non-zero respective densities. f Evolutionary conservation of the mCpGs that are correlated with genes at proximal, intermediate, or long range. g Heatmap illustrating fold enrichment of the mCpGs in repetitive elements. For bf asterisks and crosses indicate statistical significance at a p value threshold of 10−4 and 10−2, respectively, per Kolmogorov–Smirnov tests (two-sided). For f, hypergeometric tests (one-sided) were used and a p value threshold of 2 × 10−2. All p values are listed in Supplementary Data 4. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Enrichment of DNA-binding proteins and multiprotein complexes (DBP/Cs) in genes with intermediate- and long-range correlations.
a Heatmaps showing the enrichment/depletion of transcription factor binding in CD34+ cells. The value represents the Z-score of the observed overlap of ChIP-seq peaks with reference to a simulated expected distribution (see Methods for details). Asterisks indicate an absolute Z-score greater than 10. b Bar plot showing the number of DBP/Cs from ENCODE significantly enriched (Z-score >10) in each gene set. Heatmap showing the enrichment/depletion of selected DBP/Cs from ENCODE enriched in the genes with intermediate- and long-range correlations but not in the ones with proximal correlations. Asterisks and notations are same as in a. c, d Plots showing the number of genes (c), mCpGs (d) or genes containing at least one MIR element (e) that are bound by up to N transcription factors (x axis). This is shown in a cumulative manner, e.g., up to 20 DBP/Cs bind a total of 393 unique genes with significant intermediate or long-range correlations (c). Red dashed lines show the number of all respective genes or mCpGs with significant correlations. f Plot showing the Z-scores of ENCODE’s DBP/Cs (same as in b) in gene space (X axis) against the respective in MIR space (Y axis). Red lines demarcate the significant thresholds. The same protein can be seen twice as it refers to the analysis based on different gene sets (marked by different colors). g, h Bar plots showing the enrichment of the interactions between DBP/Cs in the mCpG-gene pairs with significant methylation-expression correlations (n = 16,142). The first protein binds the mCpG and the second binds at the gene (g) or a MIR element within the intronic space of the gene (h). For both plots, p values were calculated with a hypergeometric test (one-sided), corrected to FDR and we show the significant results, FDR < 5%. All Z-scores and DBP/C interaction pairs are included in Supplementary Data 5. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. DNA methylation changes on MIR, Alu and L1 elements.
a Pie charts showing the percentage of differentially methylated cytosines (DMCs) overlapping the respective elements. bd Pie charts showing the enrichment of protein-coding genes around the DMCs with reference to all cytosines included in the analysis, statistically evaluated with chi-squared tests. e Fold enrichment of the proximal or W gene sets overlapping the DMCs of each repetitive element. f Fold enrichment in enhancers overlapping the DMCs of each repetitive element. For e, f, p values are noted on the figures, asterisks indicate enrichment or depletion with p value <10−5 per hypergeometric test (one-sided) for n = 3685, n = 10,162 and n = 1585 DMCs overlapping MIR, Alu or L1 elements, respectively. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Mouse models recapitulate the dependency of MIR-rich genes on proper Dnmt3a and/or Idh2 function.
Difference between the cumulative distribution of the background genes (dashed gray line) and the cumulative distribution of the up- or downregulated genes after Dnmt3a KO (left column), Idh2 R140Q mutation (middle column) or both Dnmt3a KO and Idh2 R140Q (right column) in GC content (a), evolutionary conservation (b), MIR density (c) and Alu density (d). Positive values mean a shift of the distribution toward higher values while negative values indicate a shift of the distribution toward lower values. Asterisk and crosses indicate statistical significance at a p value threshold of 10−4 and 10−2, respectively, per Kolmogorov–Smirnov tests (two-sided). All p values are listed on Supplementary Data 4. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. AML subtype differences can be projected on MIR-biased expression trajectories of normal hematopoiesis.
a PCA plot of the ranked-normalized normal and AML expression profiles. For simplicity, only AML samples are plotted, while ellipses are fitted to show the space occupied by each lineage. For each AML subtype, the median projection per principal component was calculated and plotted as a square. The exact coordinates of each sample are included in Supplementary Data 6. Significance Analysis of Microarrays (SAM) plots showing the statistically significant differences in the distances between Double and DNMT3A (b) or IDH mutant samples (c). Each circle represents a normal sample. If the difference between the observed score and the expected, as calculated after random permutations, is larger than the threshold delta corresponding to an FDR of 5%, then the distance of the DNMT3A mutant samples (b) or the IDH mutant samples (c) is significantly different than the distance of the double mutants. Samples that exceed this threshold are colored orange, green or yellow. dg Enrichment in cell types on the samples significantly closer (n = 43 samples and n = 61 samples for DNMT3A or IDH1/2 mutants, respectively) or further (n = 74 samples for IDH1/2 mutants) to the double mutants as compared to the single mutants. Colors match the respective groups of b, c. Asterisks indicate statistical significance per hypergeometric test (one-sided). h Heatmap showing the overlap of differentially expressed genes between IDH1/2 and DNMT3A AML mutants with the DE genes across normal hematopoiesis. i Heatmap showing the architectural biases in the differentially expressed gene of each cell type as compared to HSCs in normal human hematopoiesis. Baso basophils, CMP common myeloid progenitor cells, Eosin eosinophils, GMP granulocyte-monocyte progenitor cells, Gran granulocytes, HSC hematopoietic stem cell, Mega megakaryocytes, MEP megakaryocyte-erythroid progenitor cells, Mono monocytes, NK natural killer cells, NKT NK T cells. Source data are provided as a Source Data file.

Similar articles

Cited by

References

    1. Cancer Genome Atlas Research Network et al. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 2013;368:2059–2074. doi: 10.1056/NEJMoa1301689. - DOI - PMC - PubMed
    1. Figueroa ME, et al. Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. Cancer Cell. 2010;18:553–567. doi: 10.1016/j.ccr.2010.11.015. - DOI - PMC - PubMed
    1. Ward PS, et al. The common feature of leukemia-associated IDH1 and IDH2 mutations is a neomorphic enzyme activity converting alpha-ketoglutarate to 2-hydroxyglutarate. Cancer Cell. 2010;17:225–234. doi: 10.1016/j.ccr.2010.01.020. - DOI - PMC - PubMed
    1. Xu W, et al. Oncometabolite 2-hydroxyglutarate is a competitive inhibitor of alpha-ketoglutarate-dependent dioxygenases. Cancer Cell. 2011;19:17–30. doi: 10.1016/j.ccr.2010.12.014. - DOI - PMC - PubMed
    1. Glass JL, et al. Epigenetic identity in AML depends on disruption of nonpromoter regulatory elements and is affected by antagonistic effects of mutations in epigenetic modifiers. Cancer Discov. 2017;7:868–883. doi: 10.1158/2159-8290.CD-16-1032. - DOI - PMC - PubMed

Publication types