Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan;5(1):e8.
doi: 10.1371/journal.pbio.0050008.

Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles

Affiliations

Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles

Jeremiah J Faith et al. PLoS Biol. 2007 Jan.

Abstract

Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental validation of the performance of these methods at the genome scale has remained elusive. Here we assess the global performance of four existing classes of inference algorithms using 445 Escherichia coli Affymetrix arrays and 3,216 known E. coli regulatory interactions from RegulonDB. We also developed and applied the context likelihood of relatedness (CLR) algorithm, a novel extension of the relevance networks class of algorithms. CLR demonstrates an average precision gain of 36% relative to the next-best performing algorithm. At a 60% true positive rate, CLR identifies 1,079 regulatory interactions, of which 338 were in the previously known network and 741 were novel predictions. We tested the predicted interactions for three transcription factors with chromatin immunoprecipitation, confirming 21 novel interactions and verifying our RegulonDB-based performance estimates. CLR also identified a regulatory link providing central metabolic control of iron transport, which we confirmed with real-time quantitative PCR. The compendium of expression data compiled in this study, coupled with RegulonDB, provides a valuable model system for further improvement of network inference algorithms using experimental data.

PubMed Disclaimer

Conflict of interest statement

Competing interests. A portion of this work was conducted in collaboration with Cellicon Biotechnologies. JJC and TSG are founders and shareholders in the company. GC and JW are also shareholders in the company. All data, results, and algorithms from this collaboration have been made publicly available.

Figures

Figure 1
Figure 1. Overview of Our Approach for Mapping the E. coli Transcriptional Regulatory Network
Microarray expression profiles were obtained from several investigators. Our laboratory profiled additional conditions, focusing on DNA damage, stress responses, and persistence. These two data sources were combined into one uniformly normalized E. coli microarray compendium that was analyzed with the CLR network inference algorithm. The predicted regulatory network was validated using RegulonDB, sequence analysis, and ChIP. The validated network was then examined for cases of combinatorial regulation, one of which was explored with follow-up real-time quantitative PCR experiments.
Figure 2
Figure 2. The CLR Algorithm: Methods and Comparison to Other Approaches
(A) A schema of the CLR algorithm. The z-score of each regulatory interaction depends on the distribution of MI scores for all possible regulators of the target gene (zi) and on the distribution of MI scores for all possible targets of the regulator gene (zj). (B) Precision and recall for several different network inference methods applied to all genes in the E. coli microarray compendium were calculated using RegulonDB. The number of correctly inferred interactions (within RegulonDB) for each recall value is labeled on the top of the chart. All algorithms performed far better than the random method. Both CLR and relevance networks reach high precisions, but CLR attains almost twice the recall of relevance networks at some levels of precision. (C) Using 60 well-chosen arrays, we can infer a network, nearly equivalent in recall and precision to the network inferred using all 445 microarrays in the compendium (dotted horizontal line), reflecting the redundancy of the compendium and the potential for improvement in choosing subsequent perturbations to profile.
Figure 3
Figure 3. The Transcriptional Regulatory Map Inferred by CLR with an Estimated 60% Precision
The precision of the network is obtained by measuring the percentage of correctly inferred edges (blue lines) out of all the predicted edges for genes with known connectivity (blue lines and green lines). The green edges represent false positives based on RegulonDB. The red edges connect genes/regulators not present in RegulonDB. A portion of the regulatory map containing many of the Lrp interactions is shown in the expanded box. Dotted lines were tested by ChIP. Magenta and cyan dotted lines are previously unknown targets of Lrp, experimentally verified by ChIP. Genes attached to cyan lines previously had no known regulator, whereas magenta indicates a gene that had at least one previously known regulator.
Figure 4
Figure 4. Annotation of Transcription Factor Function by Functional Enrichment Using Predicted Targets from the 60% Precise Network
The functional categories of the target genes of each transcription factor were tested for enrichment by a hypergeometric test. Enriched functions indicate which aspects of cellular physiology were most represented in the inferred regulatory interactions. These enriched categories also reflect the conditions sampled in the microarray compendium.
Figure 5
Figure 5. Experimental Validation of Inferred Regulatory Interactions
Global precision scores determined with RegulonDB for a set of 268 regulatory interactions were in good correspondence with the local precision scores determined via RegulonDB plus ChIP for three transcription factors. The blue bar indicates inferred interactions that are true positives based on RegulonDB and ChIP. The green bar shows the number of inferred interactions not in RegulonDB that were positive for ChIP, representing 21 new experimentally verified regulatory interactions. The red bar shows inferred interactions that are false positives based on RegulonDB and ChIP.
Figure 6
Figure 6. Analysis of the Regulation of the fecABCDE Iron Transport Operon
All expression data are log2 transformed and RNA normalized. (A) Fur shows no correlation to the fecA operon, one of its known target operons. (B) FecI shows correlation to its known operon target fecA with a bifurcation that suggests combinatorial regulation by another transcription factor. (C) PdhR, a regulator of pyruvate metabolism, is not known to regulate the fecA operon. However, their expression values are correlated in the compendium. (D) Alignment of known PdhR binding motifs with fecA promoter. The known FecI binding motif is further downstream. (E) A schema of the new proposed regulatory structure of the fecABCDE operon. (F) Viewing the expression of fecA (the z-axis is represented as color changes corresponding to the values on the color bar on the right) as a function of both transcription factors suggests its regulation by FecI and PdhR might be AND-like. (G) pdhR expression is highly dependent on the concentration of pyruvate in the media. Expression values exhibit high uncertainty at the threshold pyruvate concentration of 0.2% (represented by vertical error bars), suggesting a bifurcation of cells into high and low expression states. (H) fecA expression was measured at 16 concentrations of two chemicals, citrate and pyruvate, known to alter the expression of fecI and pdhR, respectively. The results further support the hypothesis that fecA expression is controlled with AND-like behavior by FecI and PdhR. fecA expression exhibits high uncertainty at 0.25 mM citrate and 0.2% pyruvate. As with pdhR expression in (G), this high uncertainty may reflect the probabilistic nature of induction near the switching threshold.

Similar articles

Cited by

References

    1. Aderem A. Systems biology: Its practice and challenges. Cell. 2005;121:511–513. - PubMed
    1. Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, et al. Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005;37:382–390. - PubMed
    1. Beer MA, Tavazoie S. Predicting gene expression from sequence. Cell. 2004;117:185–198. - PubMed
    1. Conlon EM, Liu XS, Lieb JD, Liu JS. Integrating regulatory motif discovery and genome-wide expression analysis. Proc Natl Acad Sci U S A. 2003;100:3339–3344. - PMC - PubMed
    1. de la Fuente A, Brazhnik P, Mendes P. Linking the genes: Inferring quantitative gene networks from microarray data. Trends Genet. 2002;18:395–398. - PubMed

Publication types

MeSH terms