Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 7:12:277.
doi: 10.1186/1471-2105-12-277.

An integrated ChIP-seq analysis platform with customizable workflows

Affiliations

An integrated ChIP-seq analysis platform with customizable workflows

Eugenia G Giannopoulou et al. BMC Bioinformatics. .

Abstract

Background: Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq), enables unbiased and genome-wide mapping of protein-DNA interactions and epigenetic marks. The first step in ChIP-seq data analysis involves the identification of peaks (i.e., genomic locations with high density of mapped sequence reads). The next step consists of interpreting the biological meaning of the peaks through their association with known genes, pathways, regulatory elements, and integration with other experiments. Although several programs have been published for the analysis of ChIP-seq data, they often focus on the peak detection step and are usually not well suited for thorough, integrative analysis of the detected peaks.

Results: To address the peak interpretation challenge, we have developed ChIPseeqer, an integrative, comprehensive, fast and user-friendly computational framework for in-depth analysis of ChIP-seq datasets. The novelty of our approach is the capability to combine several computational tools in order to create easily customized workflows that can be adapted to the user's needs and objectives. In this paper, we describe the main components of the ChIPseeqer framework, and also demonstrate the utility and diversity of the analyses offered, by analyzing a published ChIP-seq dataset.

Conclusions: ChIPseeqer facilitates ChIP-seq data analysis by offering a flexible and powerful set of computational tools that can be used in combination with one another. The framework is freely available as a user-friendly GUI application, but all programs are also executable from the command line, thus providing flexibility and automatability for advanced users.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Workflow use cases. Examples of workflows that can be easily generated using tools from the ChIPseeqer framework are shown. The starting point is always the result of peak detection: a set of enriched regions/peaks. (A) The aim of the workflow is to analyze a subset of the peaks that have a specific motif. From all the peaks that have the motif, we look for those that bind in the promoters of known genes. Pathways analysis is then performed on these genes in order to reveal enriched pathways associated with this particular subset of peaks. (B) This workflow allows locating and characterizing distal regulatory elements (i.e., intergenic peaks) that overlap with enhancer marks (e.g., H3K4me1 binding), in terms of motifs and conservation. Different workflows can be created using any combination of the ChIPseeqer tools.
Figure 2
Figure 2
Analysis of the ETS1 ChIP-seq dataset. (A) The ChIPseeqerAnnotate module outputs the distribution of the ETS1 binding peaks in gene parts, as well as several lists of peaks that were found in a specific gene part (e.g., promoters, exons, introns). (B) The occurrence of specific motifs among the ETS1 peaks is shown, after using ChIPseeqerMotifMatch. The underlined motifs represent transcription factors of the ETS domain. (C) Unsupervised motif discovery, using ChIPseeqerFIRE, reveals multiple motifs that derive from the same regions. The fraction of ETS1 peaks containing at least one instance of each motif is given, with the expected frequency of the motif in the random regions given in the parentheses.
Figure 3
Figure 3
Identification of putative enhancers. This workflow shows the identification of putative enhancers, by progressively filtering the distal peaks with histone modification enhancer marks (i.e., presence of H3K4me1 and absence of H3K4me3) and CBP binding. De novo motif discovery and conservation analysis were then performed, which showed highly enriched ETS-domain motifs and high conservation scores in the set of putative enhancers compared to random regions.
Figure 4
Figure 4
ChIPseeqer graphical interface. (A) The users can control all parameters of the tools. For example, in the Find Pathway tool (the GUI version of ChIPseeqerPathwayMatch) the user can select: the input peaks, the species of their data, the gene annotation database used to extract the genes related to the input peaks, which subset of the peaks to include in the analysis (e.g., promoter peaks, intergenic peaks), and which pathways database to use in order to look for the pathway. The desired pathway can be either selected from a list of available pathways or typed by the user (e.g., apoptosis, development). (B) The typical output of each tool is a table summarizing all peaks resulting from the analysis, as well as basic statistics (e.g., how many peaks found). Here, the peaks that contain the TCCTAGA motif are shown, after using the Find Motif in peaks tool (the GUI version of ChIPseeqerMotifMatch). (C) Several tools also provide graphical output. For example, the summary result of iPAGE tool (the GUI version of ChIPseeqeriPAGE) is a pathway enrichment table showing the level of enrichment for all pathways found in the genes related to the input peaks (category 1), compared to the genes used as background (category 0). (D) The output of the Similarity coefficient tool (the GUI version of ChIPseeqerComputeJaccardIndex) is a color-coded matrix, showing the pairs of datasets that have more common peaks than others, with darker red color.

Similar articles

Cited by

References

    1. Schmidt D, Wilson MD, Spyrou C, Brown GD, Hadfield J, Odom DT. ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions. Methods. 2009;48:240–248. doi: 10.1016/j.ymeth.2009.03.001. - DOI - PMC - PubMed
    1. Pepke S, Wold B, Mortazavi A. Computation for ChIP-seq and RNA-seq studies. Nat Methods. 2009;6:S22–32. doi: 10.1038/nmeth.1371. - DOI - PMC - PubMed
    1. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009;10:669–680. - PMC - PubMed
    1. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316:1497–1502. doi: 10.1126/science.1141319. - DOI - PubMed
    1. Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S, Myers RM, Sidow A. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods. 2008;5:829–834. doi: 10.1038/nmeth.1246. - DOI - PMC - PubMed

Publication types

Substances

Associated data

LinkOut - more resources