Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan;38(Database issue):D716-25.
doi: 10.1093/nar/gkp1015. Epub 2009 Nov 24.

GeneSigDB--a curated database of gene expression signatures

Affiliations

GeneSigDB--a curated database of gene expression signatures

Aedín C Culhane et al. Nucleic Acids Res. 2010 Jan.

Abstract

The primary objective of most gene expression studies is the identification of one or more gene signatures; lists of genes whose transcriptional levels are uniquely associated with a specific biological phenotype. Whilst thousands of experimentally derived gene signatures are published, their potential value to the community is limited by their computational inaccessibility. Gene signatures are embedded in published article figures, tables or in supplementary materials, and are frequently presented using non-standard gene or probeset nomenclature. We present GeneSigDB (http://compbio.dfci.harvard.edu/genesigdb) a manually curated database of gene expression signatures. GeneSigDB release 1.0 focuses on cancer and stem cells gene signatures and was constructed from more than 850 publications from which we manually transcribed 575 gene signatures. Most gene signatures (n = 560) were successfully mapped to the genome to extract standardized lists of EnsEMBL gene identifiers. GeneSigDB provides the original gene signature, the standardized gene list and a fully traceable gene mapping history for each gene from the original transcribed data table through to the standardized list of genes. The GeneSigDB web portal is easy to search, allows users to compare their own gene list to those in the database, and download gene signatures in most common gene identifier formats.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Curation of gene signatures in GeneSigDB. (A) GeneSigDB hierarchical file structure SigID.txt, SigID-mapping.txt, SigID-maptrace.txt, SigID-standardized.txt and SigID-index.xml. These files are respectively, the original gene signature, the mappable gene identifiers from SigID.txt, the mapping-trace showing how each gene was mapped, a list of EnsEMBL gene identifiers that correspond to genes in SigID.txt, and xml annotation of SigID.txt. (B) An example xml gene signature annotation file.
Figure 2.
Figure 2.
Overlap in gene signatures across tumor types. (A) Heatmap-style representation of gene overlap. Each row is a gene and each column is a tissue type. Presence of a gene is indicated by a black line, and absence is white. It can be seen that some gene maybe linked to phenotypic subclasses in many tumors, but there are generally many more tumor-specific genes, likely indicating the effect of the tissue of origin. (B) Dendrogram of hierarchical cluster analysis that was performed using a Sorensen's coefficient asymmetrical measure of binary distance and joined using Ward's minimum variance method.
Figure 3.
Figure 3.
Visualization of gene overlap in gene signatures. In this example, we searched for Fanconi anemia-related genes by performing a gene search for FANC* which returned 12 human genes and 2 mouse genes. This screen shows the overlap between gene signatures in which these 12 genes are present in at least 2/12. Red and grey boxes indicate presence or absence, respectively.

Similar articles

Cited by

References

    1. Oron AP, Jiang Z, Gentleman R. Gene set enrichment analysis using linear models and diagnostics. Bioinformatics. 2008;24:2586–2591. - PMC - PubMed
    1. Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007;23:980–987. - PubMed
    1. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA. 2005;102:15545–15550. - PMC - PubMed
    1. Ross JS. Multigene classifiers, prognostic factors, and predictors of breast cancer clinical outcome. Adv. Anat. Pathol. 2009;16:204–215. - PubMed
    1. Sparano JA, Paik S. Development of the 21-gene assay and its application in clinical practice and clinical trials. J. Clin. Oncol. 2008;26:721–728. - PubMed

Publication types