Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 30;43(19):e125.
doi: 10.1093/nar/gkv607. Epub 2015 Jun 18.

By the company they keep: interaction networks define the binding ability of transcription factors

Affiliations

By the company they keep: interaction networks define the binding ability of transcription factors

Davide Cirillo et al. Nucleic Acids Res. .

Abstract

Access to genome-wide data provides the opportunity to address questions concerning the ability of transcription factors (TFs) to assemble in distinct macromolecular complexes. Here, we introduce the PAnDA (Protein And DNA Associations) approach to characterize DNA associations with human TFs using expression profiles, protein-protein interactions and recognition motifs. Our method predicts TF binding events with >0.80 accuracy revealing cell-specific regulatory patterns that can be exploited for future investigations. Even when the precise DNA-binding motifs of a specific TF are not available, the information derived from protein-protein networks is sufficient to perform high-confidence predictions (area under the ROC curve of 0.89). PAnDA is freely available at http://service.tartaglialab.com/new_submission/panda.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Trends in PPI networks. (a) Graphical representation of TF binding modes. In addition to direct binding (layer 1 in the interaction network—blue dots), we take into account the contribution of cofactors (layer 2—red dots) and mediated cofactors (layer 3—green dots). (b) Each network layer shows significant difference in frequencies of binding motifs associated with high and low ChIP-seq peaks (motifs were retrieved from a number of open-source databases; Online methods: TFBSs databases).
Figure 2.
Figure 2.
Training the PAnDA approach. Network layers and algorithm performances. In 275 datasets (8), we observed a consistent increase in cross-validation accuracies (ΔPerformance >0; Online Methods: trend analysis and models selection) upon layers integration (layer 1 → layer 2; layer 2 → layer 3; Online Methods: TFBSs databases; blue dots indicate ΔPerformance with P-value < 0.01). Colors highlight specific trends: light green and pink indicate that addition of layers 2 and 3 is associated with an increase in predictive power (light green: layer 3 has stronger signal than layer 2; pink: vice versa), while light blue and purple indicate decrease (light blue: layer 3 has lower signal than layer 2; purple: vice versa).
Figure 3.
Figure 3.
Testing the PAnDA approach. Performances on independent test sets [147 datasets (8); ‘Material and Methods’ section: ChIP-seq datasets]. Four models based on different network layers (model 1: layer 1; model 2: layers 1 and 2; model 3: layers 1,2 and 3; model 4: layers 2 and 3) have been applied to cases without annotated target TF motifs (Supplementary Figure S1c). Areas Under the ROC curve (AUROCs) show that interaction network information (layers 2 and 3) provides accurate description of binding events.
Figure 4.
Figure 4.
Example of PPI networks used in PAnDA calculations. (a) Components of PPI networks selected for predictions of NANOG interactions [H1-hESC cell line] (8); (b) Performances based on DNA-binding motifs of target TF (model 1: NANOG; AUROC = 0.50), target TF and cofactors (model 2: NANOG, TP53, SOX2 and POU5F1; AUROC = 0.60), target TF, cofactors and mediated cofactors (model 3: NANOG, TP53, SOX2, POU5F1, CTCF and YY1; AUROC = 0.98) and cofactors and mediated cofactors (model 4: TP53, SOX2, POU5F1, CTCF and YY1; AUROC = 0.96). The network is represented using squares for target TF (NANOG) and circles for other proteins (cofactors and mediated cofactors). The color palette refers to quantiles of expression levels (increasing from blue to yellow). Factors predicted to be not relevant for the binding of target TF are colored in gray.
Figure 5.
Figure 5.
Specificity of PAnDA models. (a) Randomization of regulatory motifs. We built 10 independent models using shuffled associations between regulatory motifs and DNA-binding proteins present in the following databases: SeAMotE (24), Jolma (14), JASPAR CORE (13), Wang (23) and UniPROBE (15). Compared to PAnDA performances (red bars), the random models (gray bars) show negligible predictive power (AUROCs ∼ 0.50) on the test set, indicating that regulatory motifs are specific for DNA targets. We note that the regulatory motifs generated with the SeAMotE approach (24) are of smaller size [6 nucleic acids on average] than those present in Jolma (14) [12 nucleic acids], JASPAR CORE (13) [12 nucleic acids], Wang (23) [16 nucleic acids] and UniPROBE (15) [16 nucleic acids], which results in poorer performances. (b) Randomization of expression levels. For each PPI network, selection of cofactors and mediated cofactors is based on cell-line abundances. Shuffling the expression levels of all DNA-binding proteins, we built 10 models (gray bars) with randomized PPI networks. On the test set, the models have poorer predictive power (AUROCs ∼ 0.50) than PAnDA (red bars), which suggests that components of PPI network are highly specific for the cell line of interest. In both plots, AUROC averages and standard deviations are shown.
Figure 6.
Figure 6.
Stability of PAnDA models. (a) Interaction network destabilization. We found a significant decrease in predictive performance (AUROC; averages and standard deviations shown) upon removal of cofactors and mediated cofactors (model 4; Online Methods: Models stability). (b) Mutations of DNA sequences. From low (1/100 or 1 mutation in 100 nt) to high (R or 1 mutation each nucleotide) mutation rates, motifs mapped by cofactors and mediated cofactors are sensibly reduced (500 sequences per ChIP dataset; model 4; Online Methods: Models stability), which affects predictive performances (AUROC; averages and standard deviations shown).
Figure 7.
Figure 7.
Using the PAnDA approach. Once DNA and TF sequences are submitted to the PAnDA web server, (a) PPI networks are selected from publicly available databases using expression levels to retrieve components of PPI networks that are active in specific cell-lines; (b) Regulatory motifs of DNA-binding proteins are mapped onto DNA sequences and reported in a table; (c) Three algorithms predict protein-DNA interactions exploiting first (TF), second (TF and cofactors) and third (TF, cofactors and mediated cofactors) layers of PPI networks. If DNA motifs of input TFs are missing, an alternative model (model 4) based on motifs of cofactors and mediated cofactors is employed. Each protein association is scored with a value for the propensity of the interaction to occur (see also Online Tutorial).

Similar articles

Cited by

References

    1. Villar D., Flicek P., Odom D.T. Evolution of transcription factor binding in metazoans—mechanisms and functional implications. Nat. Rev. Genet. 2014;15:221–233. - PMC - PubMed
    1. Weirauch M.T., Cote A., Norel R., Annala M., Zhao Y., Riley T.R., Saez-Rodriguez J., Cokelaer T., Vedenko A., Talukder S., et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotech. 2013;31:126–134. - PMC - PubMed
    1. Weingarten-Gabbay S., Segal E. The grammar of transcriptional regulation. Hum. Genet. 2014;133:701–711. - PMC - PubMed
    1. Gerstein M.B., Kundaje A., Hariharan M., Landt S.G., Yan K.-K., Cheng C., Mu X.J., Khurana E., Rozowsky J., Alexander R., et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012;489:91–100. - PMC - PubMed
    1. Vaquerizas J.M., Kummerfeld S.K., Teichmann S.A., Luscombe N.M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 2009;10:252–263. - PubMed

Publication types