Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Mar 30;3(3):e61.
doi: 10.1371/journal.pcbi.0030061. Epub 2007 Feb 15.

DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies

Affiliations

DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies

Shaun Mahony et al. PLoS Comput Biol. .

Abstract

Transcription factor (TF) proteins recognize a small number of DNA sequences with high specificity and control the expression of neighbouring genes. The evolution of TF binding preference has been the subject of a number of recent studies, in which generalized binding profiles have been introduced and used to improve the prediction of new target sites. Generalized profiles are generated by aligning and merging the individual profiles of related TFs. However, the distance metrics and alignment algorithms used to compare the binding profiles have not yet been fully explored or optimized. As a result, binding profiles depend on TF structural information and sometimes may ignore important distinctions between subfamilies. Prediction of the identity or the structural class of a protein that binds to a given DNA pattern will enhance the analysis of microarray and ChIP-chip data where frequently multiple putative targets of usually unknown TFs are predicted. Various comparison metrics and alignment algorithms are evaluated (a total of 105 combinations). We find that local alignments are generally better than global alignments at detecting eukaryotic DNA motif similarities, especially when combined with the sum of squared distances or Pearson's correlation coefficient comparison metrics. In addition, multiple-alignment strategies for binding profiles and tree-building methods are tested for their efficiency in constructing generalized binding models. A new method for automatic determination of the optimal number of clusters is developed and applied in the construction of a new set of familial binding profiles which improves upon TF classification accuracy. A software tool, STAMP, is developed to host all tested methods and make them publicly available. This work provides a high quality reference set of familial binding profiles and the first comprehensive platform for analysis of DNA profiles. Detecting similarities between DNA motifs is a key step in the comparative study of transcriptional regulation, and the work presented here will form the basis for tool and method development for future transcriptional modeling studies.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Illustration of Familial Binding Profile Construction
In this example, the binding motifs for four bZIP–CREB transcription factors are aligned in a multiple-motif alignment. The generalized familial binding profiles correspond to the weighted average of the individual profiles.
Figure 2
Figure 2. Distribution of the Observed Scores of Column-to-Column Comparisons for the Five Main Similarity Metrics
Columns are obtained from the TRANSFAC database [17]. The ALLR_LL distribution is identical to ALLR for every point ≥2 (unpublished data). Comparison of the JASPAR motif columns yielded similar results.
Figure 3
Figure 3. Performance of the Five Main Similarity Metrics in Discriminating between Columns Sampled from Dirichlet Distributions around Information Content I and a Background Distribution
The plot shows the positive predictive rate for an FDR of 1% as a function of the information content.
Figure 4
Figure 4. Average Homogeneity of Families Represented at Each Tree Node as a Factor of the Growth of the Tree
Six scoring metrics and two different tree-building methods are tested with ungapped Smith–Waterman alignments.
Figure 5
Figure 5. The Tree Resulting from a UPGMA Tree Construction of Ten JASPAR Families (71 Motifs Total) Using the PCC Scoring Metric and Smith–Waterman (Ungapped) Alignment Method
The red line represents the level at which the CH log metric estimates the optimal number of data clusters on the tree.
Figure 6
Figure 6. The Behaviour of the Calinski and Harabasz–Based Log-Metric (CH log) for the Tree in Figure 5 as the Number of Clusters (g) Is Varied
The value of g = 17 produces a global minimum in the value of CH log.
Figure 7
Figure 7. The Tree Resulting from a UPGMA Tree Construction of 12 JASPAR Families (79 Motifs Total) Using the PCC Scoring Metric and Smith–Waterman (Ungapped) Alignment Method
This tree includes the two zinc-finger families (GATA and DOF).
Figure 8
Figure 8. Optimal Number of Clusters of the 71 JASPAR Motifs, According to Our Method
PCC with Smith–Waterman ungapped alignment was used as a scoring function. Examples of protein–DNA complexes are provided for comparison.
Figure 9
Figure 9. Similarity between the HMG and Forkhead Motifs
These families are grouped together on the HMG/Forkhead Group I cluster (Figure 8).

Similar articles

Cited by

References

    1. Stormo GD. DNA binding sites: Representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
    1. Suzuki M, Yagi N. DNA recognition code of transcription factors in the helix-turn-helix, probe helix, hormone receptor, and zinc finger families. Proc Natl Acad Sci U S A. 1994;91:12357–12361. - PMC - PubMed
    1. Rudolph MJ, Gergen JP. DNA-binding by Ig-fold proteins. Nat Struct Biol. 2001;8:384–386. - PubMed
    1. Pabo CO, Peisach E, Grant RA. Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem. 2001;70:313–340. - PubMed
    1. Auron PE. DNA sequence-specific transcription factors. In: Lotze MT, Thomson AW, editors. Measuring immunity: Basic science and clinical practice. 1st edition. London: Elsevier; 2004. pp. 91–109.

Publication types