Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 3;20(1):1.
doi: 10.1186/s13059-018-1612-0.

The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens

Affiliations

The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens

Dimitra Repana et al. Genome Biol. .

Abstract

The Network of Cancer Genes (NCG) is a manually curated repository of 2372 genes whose somatic modifications have known or predicted cancer driver roles. These genes were collected from 275 publications, including two sources of known cancer genes and 273 cancer sequencing screens of more than 100 cancer types from 34,905 cancer donors and multiple primary sites. This represents a more than 1.5-fold content increase compared to the previous version. NCG also annotates properties of cancer genes, such as duplicability, evolutionary origin, RNA and protein expression, miRNA and protein interactions, and protein function and essentiality. NCG is accessible at http://ncg.kcl.ac.uk/ .

Keywords: Cancer genes; Cancer genomics screens; Cancer heterogeneity; Systems-level properties.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Manual curation of cancer genes in NCG. a Pipeline used for adding cancer genes to NCG. Two sources of known cancer genes [19, 53] were integrated leading to 711 known cancer genes. In parallel, 273 publications describing cancer sequencing screens were reviewed to extract 2088 cancer genes. The non-redundant union of these two sets led to 2372 cancer genes currently annotated in NCG. b Intersection between known and candidate cancer genes in NCG. c Comparison of NCG content with the previous version [11]. d Pie chart of the methods used to identify cancer genes in the 273 publications. The total is greater than 273 because some studies used more than one method (Additional file 2: Table S2). e Cancer genes as a function of the number of cancer donors per study. The grey inset shows a magnification of the left bottom corner of the plot. f Number of methods used to identify cancer genes over time. PanSoftware used in one of the pan-cancer studies [6] was considered as a single method but is in fact a combination of 26 prediction tools
Fig. 2
Fig. 2
Distribution of cancer genes across primary sites and cancer donors. a Number of total cancer genes and proportion of known and candidate cancer genes across the 31 tumor primary sites analyzed in the 267 cancer-specific studies. The number of cancer donors followed by the number of cancer genes is given in brackets for each primary site. b Proportion of candidate cancer genes over all cancer genes across the 31 tumor primary sites. The dot size is proportional to the donor cohort size. c Total number of cancer genes and cancer donors across the 31 tumor primary sites. The color scale in (b) and (c) indicates the number of screens for each primary site
Fig. 3
Fig. 3
Recurrence of cancer across primary sites and publications. a Proportion of study-specific cancer genes reported by each of the seven skin melanoma screens. b Total number of cancer genes and donors across 24 cancer types of the blood. The full list of blood cancer types is reported in Additional file 2: Table S2. c Number of primary sites in which each known or candidate cancer gene was reported to be a driver. d Number of publications in which each known or candidate cancer gene was reported to be a driver. e Number of methods used to predict cancer genes for drivers found in more than one publication. f Intersection of cancer genes in the cancer-specific and pan-cancer studies. g Venn diagram of cancer genes across the four pan-cancer studies of adult donors. h Intersection of cancer genes in pan-cancer screens of adult and pediatric donors. In f, g, and h, the number of donors followed by the total number of cancer genes are given in brackets
Fig. 4
Fig. 4
Systems-level properties of cancer genes. a Percentage of genes with ≥ 1 gene duplicate covering ≥ 60% of the protein sequence. b Proportion of genes originating in pre-metazoan species. c, d Number of human tissues in which genes (c) and proteins (d) are expressed. In panel c, tissue types were matched between GTEx and Protein Atlas wherever possible, giving 43 unique tissues. In tissues represented in both datasets, genes were defined as expressed if they had ≥ 1 TPM in both datasets. Only genes present in both sources were compared (Additional file 2: Table S1). e Percentage of genes essential in ≥ 1 cell line and distribution of cell lines in which each gene is essential. Only genes with concordant annotation between OGEE and PICKLES were compared (Additional file 2: Table S1). f Percentage of proteins involved in ≥ 1 protein complex. g Median values of betweenness (centrality), clustering coefficient (clustering), and degree (connectivity) of human proteins in the protein-protein interaction network. h Median values of betweenness and degree of the target genes in the miRNA-target interaction network. The clustering coefficient is zero for all nodes, because interactions occur between miRNAs and target genes. Known, candidate, and all cancer genes were compared to the rest of human genes, while TSGs were compared to OGs. Significance was calculated using a two-sided Fisher test (a, b, e, f) or Wilcoxon test (c, d, g, h). *p < 0.05, **p < 0.01, ***p < 0.001. Enrichment and depletion of cancer genes in representative functional categories taken from level 1 of Reactome (i) and level 2 of KEGG (j). Significance was calculated comparing each group of cancer genes to the rest of human genes using a two-sided Fisher test. False discovery rates were calculated in each gene set separately. Only pathways showing enrichment or depletion are shown. The full list of pathways is provided in Additional file 2: Table S3

Similar articles

Cited by

References

    1. Consortium ICG International network of cancer genome projects. Nature. 2010;464:993. doi: 10.1038/nature08987. - DOI - PMC - PubMed
    1. Tomczak K, Czerwińska P, Wiznerowicz M. The Cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol. 2015;19:A68. - PMC - PubMed
    1. Nakagawa H, Fujita M. Whole genome sequencing analysis for cancer genomics and precision medicine. Cancer Sci. 2018;109:513–522. doi: 10.1111/cas.13505. - DOI - PMC - PubMed
    1. Poulos RC, Wong JW. Finding cancer driver mutations in the era of big data research. Biophys Rev. 2018;10:1–9. - PMC - PubMed
    1. Cancer Genome Atlas Research N. Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. - DOI - PMC - PubMed

Publication types