Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 29:9:42.
doi: 10.1186/s13072-016-0095-z. eCollection 2016.

EpiMINE, a computational program for mining epigenomic data

Affiliations

EpiMINE, a computational program for mining epigenomic data

SriGanesh Jammula et al. Epigenetics Chromatin. .

Abstract

Background: In epigenetic research, both the increasing ease of high-throughput sequencing and a greater interest in genome-wide studies have resulted in an exponential flooding of epigenetic-related data in public domain. This creates an opportunity for exploring data outside the limits of any specific query-centred study. Such data have to undergo standard primary analyses that are accessible with multiple well-stabilized programs. Further downstream analyses, such as genome-wide comparative, correlative and quantitative analyses, are critical in deciphering key biological features. However, these analyses are only accessible for computational researchers and completely lack platforms capable of handling, analysing and linking multiple interdisciplinary datasets with efficient analytical methods.

Results: Here, we present EpiMINE, a program for mining epigenomic data. It is a user-friendly, stand-alone computational program designed to support multiple datasets, for performing genome-wide correlative and quantitative analysis of ChIP-seq and RNA-seq data. Using data available from the ENCODE project, we illustrated several features of EpiMINE through different biological scenarios to show how easy some known observations can be verified. These results highlight how these approaches can be helpful in identifying novel biological features.

Conclusions: EpiMINE performs different kinds of genome-wide quantitative and correlative analyses, using ChIP-seq- and RNA-seq-related datasets. Its framework enables it to be used by both experimental and computational researchers. EpiMINE can be downloaded from https://sourceforge.net/projects/epimine/.

Keywords: ChIP-seq; Chromatin immunoprecipitation; Correlation; NGS; Quantification; RNA-seq.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Preferential enrichment, coexistence and correlation analysis. a Barplot representing proportion of H3K27ac-positive promoters (in green), H3K27ac-positive enhancers (in red) and random regions (in blue), bound by different factors. b Heatmap showing the presence (dark blue) or absence (light blue) of different factors in a Bcl11a-binding regions. Closer the presence of any factor to Bcl11a, greater the coexistence. c Heatmap showing the presence (dark blue) or absence (light blue) of different factors in promoters of the top 3000 highly expressed genes. Closer the presence of any factor to Promo, greater the coexistence. d Genome-wide correlation between different factors along all promoters of the human genome. e Variable plot with different factors and their degree of correlation with others along all promoters of human genome across first two principal components
Fig. 2
Fig. 2
Quantification in and around ROI. a Heatmap with genome-wide-based normalized intensities for different histone modifications RNA-PolII in H3K27ac (top panel)- and H3K27me3 (bottom panel)-positive regions. b Expression level of target genes in each cluster identified in a. Top panel represents expression levels for target gene clusters for H3K27ac regions; lower panel represents H3K27me3-positive regions. c Intensities of H3K27ac ChIP in a 5-kb region surrounding the centre of enhancer regions, across five different cell lines. d Expression levels of target genes in Gm12878 in clusters identified in C
Fig. 3
Fig. 3
Average profile and spike-in-based normalization. a Average profile of H3K4me3 in promoter regions of genes classified on the basis of expression levels (high to low). b Average profile of H3K36me3 in gene bodies of genes classified on the basis of expression levels (high to low). c Intensities of H3K79me2 within 10 kb surrounding TSS (both up- and downstream) in regions possessing H3K79me2 in WT samples, and its fate in other samples induced with different levels of inhibitor without a reference genome. d Intensities of H3K79me2 around 10 kb surrounding TSS (both up- and downstream) in regions possessing H3K79me2 in WT samples, and its fate in other samples induced with different levels of inhibitor with a reference genome. e Average profile of H3K79me2 within 10 kb surrounding TSS (both up- and downstream) in regions possessing H3K79me2 in WT samples, and its fate in other samples induced with different levels of inhibitor without a reference genome. f Average profile of H3K79me2 within 10 kb surrounding TSS (both up- and downstream) in regions possessing H3K79me2 in WT samples, and its fate in other samples induced with different levels of inhibitor with a reference genome
Fig. 4
Fig. 4
Differential analysis. a Volcano plot representing significantly enriched promoters (marked in green) harbouring different levels of H3K4me3 methylation in skeletal muscle as compared to keratinocytes. b Distribution of expression levels of genes with promoters that show significantly higher levels of H3K4me3 in skeletal muscle as compared to those of keratinocytes. c Distribution of expression levels of genes with promoters that show significantly higher levels of H3K4me3 in keratinocytes as compared to those of skeletal muscle. d Significantly enriched promoters with respect to K4me3 across eight different cell lines. Represented here are their intensities in standard z-score form. e Expression level of target genes in each cluster across eight different cell lines identified in d
Fig. 5
Fig. 5
Predicting dependencies and characterizing ROI. a Bayesian network showing dependency between different factors in compact chromatin regions of genome presided by Suz12. b Bayesian network showing dependency between different factors in random regions of genome. c Plot signifying the accuracy of different set of variables for characterizing active enhancers and promoters. d ROC curve representing strength of an SVM-trained model for classifying active enhancers and promoters, using variables with a high accuracy level identified in c

Similar articles

Cited by

References

    1. Li H, Durbin R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. - DOI - PMC - PubMed
    1. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. - DOI - PMC - PubMed
    1. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–1111. doi: 10.1093/bioinformatics/btp120. - DOI - PMC - PubMed
    1. Jean G, Kahles A, Sreedharan VT, De Bona F, Ratsch G. RNA-Seq read alignments with PALMapper. Curr Protoc Bioinformatics, Chapter 11, Unit 11 16 (2010). - PubMed
    1. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. - DOI - PMC - PubMed

Publication types