Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Oct;16(5):261-73.
doi: 10.1093/dnares/dsp014. Epub 2009 Sep 9.

Exhaustive search for over-represented DNA sequence motifs with CisFinder

Affiliations

Exhaustive search for over-represented DNA sequence motifs with CisFinder

Alexei A Sharov et al. DNA Res. 2009 Oct.

Abstract

We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cells based on published chromatin immunoprecipitation sequencing data. Furthermore, CisFinder successfully identified alternative binding motifs of TFs (e.g. POU5F1, ESRRB, and CTCF) and motifs for known and unknown co-factors of genes associated with the pluripotent state of ES cells. CisFinder also showed robust performance in the identification of motifs that were only slightly enriched in a set of DNA sequences.

PubMed Disclaimer

Figures

Figure 1
Figure 1
CisFinder algorithm for de novo identification of DNA motifs. (A) Example of a nucleotide substitution matrix for word ATGCAAAT; (B) frequency substitution matrices for the test and control sequences; (C) subtraction of matrices; (D) negative values are replaced by zero; (E) normalized PFM; (F) position and width of gaps in the words; (G) extending the PFM over the gaps and flanking sequences; (H) clustering and combining of PFMs to generate a sequence logo.
Figure 2
Figure 2
Testing CisFinder algorithm. (A) Binding motifs of POU5F1 generated by clustering of PFMs (with CisFinder) and over-represented 8-mer words. Binding motifs of TFs in ES cells identified with CisFinder. (B) Comparison of TF binding motifs generated by Chen et al. using Weeder and motifs generated with CisFinder. (C–E) Binding motifs of POU5F1, ESRRB, and CTCF, respectively, identified with CisFinder.
Figure 3
Figure 3
Motifs of TFs and their co-factors over-represented in ChIP-seq (data from Chen et al.) distal binding sites (200 bp segments centered at binding sites and located 500–100 000 bp away from transcription start sites) compared with flanking regions 500–1000 bp away from binding sites. Motifs were selected if they were over-represented by >2-fold for at least one TF; search was done with CisFinder using the option of one false positive match per 10 kb. Groups of TFs and binding motifs with high over-representation rate are outlined.

Similar articles

Cited by

References

    1. Stoltenburg R., Reinemann C., Strehlitz B. SELEX–a (r)evolutionary method to generate high-affinity nucleic acid ligands. Biomol. Eng. 2007;24:381–403. - PubMed
    1. Badis G., Berger M.F., Philippakis A.A., et al. Diversity and complexity in DNA recognition by transcription factors. Science. 2009;324:1720–3. - PMC - PubMed
    1. Barski A., Cuddapah S., Cui K., et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–37. - PubMed
    1. Johnson D.S., Mortazavi A., Myers R.M., Wold B. Genome-wide mapping of in vivo protein–DNA interactions. Science. 2007;316:1497–502. - PubMed
    1. Robertson G., Hirst M., Bainbridge M., et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods. 2007;4:651–7. - PubMed

Publication types

Substances