Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Sep 14:6:227.
doi: 10.1186/1471-2105-6-227.

Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks

Affiliations

Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks

Cecily J Wolfe et al. BMC Bioinformatics. .

Abstract

Background: Biological processes are carried out by coordinated modules of interacting molecules. As clustering methods demonstrate that genes with similar expression display increased likelihood of being associated with a common functional module, networks of coexpressed genes provide one framework for assigning gene function. This has informed the guilt-by-association (GBA) heuristic, widely invoked in functional genomics. Yet although the idea of GBA is accepted, the breadth of GBA applicability is uncertain.

Results: We developed methods to systematically explore the breadth of GBA across a large and varied corpus of expression data to answer the following question: To what extent is the GBA heuristic broadly applicable to the transcriptome and conversely how broadly is GBA captured by a priori knowledge represented in the Gene Ontology (GO)? Our study provides an investigation of the functional organization of five coexpression networks using data from three mammalian organisms. Our method calculates a probabilistic score between each gene and each Gene Ontology category that reflects coexpression enrichment of a GO module. For each GO category we use Receiver Operating Curves to assess whether these probabilistic scores reflect GBA. This methodology applied to five different coexpression networks demonstrates that the signature of guilt-by-association is ubiquitous and reproducible and that the GBA heuristic is broadly applicable across the population of nine hundred Gene Ontology categories. We also demonstrate the existence of highly reproducible patterns of coexpression between some pairs of GO categories.

Conclusion: We conclude that GBA has universal value and that transcriptional control may be more modular than previously realized. Our analyses also suggest that methodologies combining coexpression measurements across multiple genes in a biologically-defined module can aid in characterizing gene function or in characterizing whether pairs of functions operate together.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation of the steps in our analyses. (a) Example flow chart of the different steps for calculating gene set coexpression enrichment Pe values between each of the 6624 genes in the multi-species network and 902 GO sets. For each gene mi we use the hypergeometric distribution to calculate a coexpression enrichment Pe-value (Pe(mi, gj)) for whether GO set gj was significantly overrepresented in the top 250 genes with smallest Pc-values to mi. (b) The four steps in our analyses. 1. A coexpression network is generated with Pc values (multi-species network) or correlation coefficients (single-species network) scoring coexpression between gene pairs. 2. Coexpression enrichment Pe values are calculated between each gene and each GO category, such as between GO category 1 and genes A, B, and C and between GO category 2 and genes A, B, and C. 3. A score reflecting GBA is calculated for each GO category (e.g., GO category 1). 4. The interrelationship between pairs of GO categories is quantified, such as that between GO category 1 and GO category 2, which are sibling categories in a Gene Ontology graph, sharing GO category 3 as a common parent.
Figure 2
Figure 2
Examples from the multi-species network. (a-d) Self-diagnostic Receiver Operating Characteristic (ROC) curves for the GO categories shown above.
Figure 3
Figure 3
Histograms of self-diagnostic ROC areas for the multi-species network. (a) Histogram for biological process GO categories. (b) Histogram for cellular component GO categories. (c) Histogram for molecular function GO categories. (d) Histogram for randomized GO sets. (d) Histogram for a randomized multi-species coexpression network.
Figure 4
Figure 4
Histograms of cross-diagnostic ROC areas for the multi-species network. (a) Histogram of ROC areas for whether descendent Pe-values are diagnostic of parent sets. (b) Histogram of ROC areas for cross pairing of sibling categories. (c) Histogram of ROC areas for all cross pairings of categories (excluding parent-descendent pairs) with distances of 3–16 in a GO graph. GO organizes categories as nodes on a graph and calculates the distance between category pairs on the same graph as the minimum number of arcs needed to traverse from one category node to another on the graph. For example, a parent and its child are separated by a distance of one and siblings are separated by distances of two. (d) Histogram of ROC areas for all cross pairings of categories (excluding parent-descendent pairs) with distances of 3–16 in a GO graph, created using randomized GO sets. (e) Histogram of cross-diagnostic ROC areas between GO category pairs (excluding parent-descendent pairs) in the subgraph below immune response. (f) Histogram of cross-diagnostic ROC areas between GO category pairs (excluding parent-descendent pairs) in the subgraph below cell cycle.
Figure 5
Figure 5
Plots of self-diagnostic ROC areas from the multi-species network (x-axis) versus ROC areas from a single-species network (y-axis) for each GO category. Each panel examines one of the single-species networks, created using microarrays from the following Affymetrix platforms: HG-U133A (human), HG-U95A (human), MG-U74A (mouse), and RG-U34A (rat). Correlation coefficients are noted in the upper left corner of the plots.

Similar articles

Cited by

References

    1. Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays. Nature Genet. 1999;21:33–37. doi: 10.1038/4462. - DOI - PubMed
    1. Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2:418–427. doi: 10.1038/35076576. - DOI - PubMed
    1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14,863–14,868. doi: 10.1073/pnas.95.25.14863. - DOI - PMC - PubMed
    1. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nature Genet. 1999;22:281–285. doi: 10.1038/10343. - DOI - PubMed
    1. Wu LF, Hughes TR, Davierwala AP, Robinson MD, Stoughton R, Altschuler SJ. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nature Genet. 2002;31:255–265. doi: 10.1038/ng906. - DOI - PubMed

Publication types

Substances